Autonomy Software C++ 24.5.1
Welcome to the Autonomy Software repository of the Mars Rover Design Team (MRDT) at Missouri University of Science and Technology (Missouri S&T)! API reference contains the source code and other resources for the development of the autonomy software for our Mars rover. The Autonomy Software project aims to compete in the University Rover Challenge (URC) by demonstrating advanced autonomous capabilities and robust navigation algorithms.
Loading...
Searching...
No Matches
yolomodel::tensorflow::TPUInterpreter Class Reference

This class is designed to enable quick, easy, and robust inferencing of .tflite yolo model. More...

#include <YOLOModel.hpp>

Inheritance diagram for yolomodel::tensorflow::TPUInterpreter:
Collaboration diagram for yolomodel::tensorflow::TPUInterpreter:

Public Member Functions

 TPUInterpreter (std::string szModelPath, PerformanceModes ePowerMode=PerformanceModes::eHigh, unsigned int unMaxBulkInQueueLength=32, bool bUSBAlwaysDFU=false)
 Construct a new TPUInterpreter object.
 
 ~TPUInterpreter ()
 Destroy the TPUInterpreter object.
 
std::vector< std::vector< Detection > > Inference (const cv::Mat &cvInputFrame, const float fMinObjectConfidence=0.85, const float fNMSThreshold=0.6) override
 Given an input image forward the image through the YOLO model to run inference on the EdgeTPU, then parse and repackage the output tensor data into a vector of easy-to-use Detection structs.
 
- Public Member Functions inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
 TensorflowTPU (std::string szModelPath, PerformanceModes ePowerMode=PerformanceModes::eHigh, unsigned int unMaxBulkInQueueLength=32, bool bUSBAlwaysDFU=false)
 Construct a new TensorflowTPU object.
 
 ~TensorflowTPU ()
 Destroy the TensorflowTPU object.
 
void CloseHardware ()
 Release all hardware and reset models and interpreters.
 
TfLiteStatus OpenAndLoad (DeviceType eDeviceType=DeviceType::eAuto)
 Attempt to open the model at the given path and load it onto the EdgeTPU device.
 
bool GetDeviceIsOpened () const
 Accessor for the Device Is Opened private member.
 

Private Member Functions

void ParseTensorOutputYOLOv5 (int nOutputIndex, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, int nOriginalFrameWidth, int nOriginalFrameHeight)
 Given a TFLite output tensor from a YOLOv5 model, parse it's output into something more usable. The parsed output will be in the form of three vectors: one for class IDs, one for the prediction confidence for the class ID, and one for cv::Rects storing the bounding box data for the prediction. A prediction will line up between the three vectors. (vClassIDs[0], vClassConfidences[0], and vBoundingBoxes[0] correspond to the same prediction.)
 
void ParseTensorOutputYOLOv8 (int nOutputIndex, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, int nOriginalFrameWidth, int nOriginalFrameHeight)
 Given a TFLite output tensor from a YOLOv8 model, parse it's output into something more usable. The parsed output will be in the form of three vectors: one for class IDs, one for the prediction confidence for the class ID, and one for cv::Rects storing the bounding box data for the prediction. A prediction will line up between the three vectors. (vClassIDs[0], vClassConfidences[0], and vBoundingBoxes[0] correspond to the same prediction.)
 
InputTensorDimensions GetInputShape (const int nTensorIndex=0)
 Get the input shape of the tensor at the given index. Requires the device to have been successfully opened.
 
OutputTensorDimensions GetOutputShape (const int nTensorIndex=0)
 Get the output shape of the tensor at the given index. Requires the device to have been successfully opened.
 

Private Attributes

cv::Mat m_cvFrame
 

Additional Inherited Members

- Public Types inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
enum  DeviceType
 
enum  PerformanceModes
 
- Static Public Member Functions inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
static std::vector< edgetpu::EdgeTpuManager::DeviceEnumerationRecord > GetHardwareDevices ()
 Retrieve a list of EdgeTPU devices from the edge API.
 
static std::vector< std::shared_ptr< edgetpu::EdgeTpuContext > > GetOpenedHardwareDevices ()
 Retrieve a list of already opened EdgeTPU devices from the edge API.
 
- Protected Member Functions inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
edgetpu::EdgeTpuManager * GetEdgeManager ()
 Retrieves a pointer to an EdgeTPUManager instance from the libedgetpu library.
 
std::string DeviceTypeToString (edgetpu::DeviceType eDeviceType)
 to_string method for converting a device type to a readable string.
 
- Protected Attributes inherited from TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >
std::string m_szModelPath
 
edgetpu::EdgeTpuManager::DeviceEnumerationRecord m_tpuDevice
 
edgetpu::EdgeTpuManager::DeviceOptions m_tpuDeviceOptions
 
std::unique_ptr< tflite::FlatBufferModel > m_pTFLiteModel
 
std::shared_ptr< edgetpu::EdgeTpuContext > m_pEdgeTPUContext
 
std::unique_ptr< tflite::Interpreter > m_pInterpreter
 
bool m_bDeviceOpened
 

Detailed Description

This class is designed to enable quick, easy, and robust inferencing of .tflite yolo model.

Bug:
This class correctly interfaces with the TPU, loads models, and runs inference, but any attempt to parse the output received from the result of the inference is garbage.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-10-24

Constructor & Destructor Documentation

◆ TPUInterpreter()

yolomodel::tensorflow::TPUInterpreter::TPUInterpreter ( std::string  szModelPath,
PerformanceModes  ePowerMode = PerformanceModes::eHigh,
unsigned int  unMaxBulkInQueueLength = 32,
bool  bUSBAlwaysDFU = false 
)
inline

Construct a new TPUInterpreter object.

Parameters
szModelPath- The path to the model to open and inference on the EdgeTPU.
ePowerMode- The desired power mode of the device.
unMaxBulkInQueueLength- Input queue length for device. Larger queue may improve USB performance going from device to host.
bUSBAlwaysDFU- Whether or not to always reload firmware into the device after this object is created.
Note
The given model must be a tflite model custom compiled to map operations to the EdgeTPU refer to https://coral.ai/docs/edgetpu/models-intro/#compiling and https://coral.ai/docs/edgetpu/compiler/#system-requirements
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-11-11
243 :
244 TensorflowTPU<std::vector<std::vector<Detection>>, cv::Mat>(szModelPath, ePowerMode, unMaxBulkInQueueLength, bUSBAlwaysDFU)
245
246 {}
This class is designed to enable quick, easy, and robust handling of .tflite models for deployment an...
Definition TensorflowTPU.hpp:39

◆ ~TPUInterpreter()

yolomodel::tensorflow::TPUInterpreter::~TPUInterpreter ( )
inline

Destroy the TPUInterpreter object.

Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-10-24
256 {
257 // Nothing to destroy.
258 }

Member Function Documentation

◆ Inference()

std::vector< std::vector< Detection > > yolomodel::tensorflow::TPUInterpreter::Inference ( const cv::Mat cvInputFrame,
const float  fMinObjectConfidence = 0.85,
const float  fNMSThreshold = 0.6 
)
inlineoverridevirtual

Given an input image forward the image through the YOLO model to run inference on the EdgeTPU, then parse and repackage the output tensor data into a vector of easy-to-use Detection structs.

Parameters
cvInputFrame- The RGB camera frame to run detection on.
fMinObjectConfidence- Minimum confidence required for an object to be considered a valid detection
fNMSThreshold- Threshold for Non-Maximum Suppression, controlling overlap between bounding box predictions.
Returns
std::vector<std::vector<Detection>> - A 2D vector of structs containing infomation about the valid object detections in the given image. There will be an std::vector<Detection> for each output tensor.
Note
The input image MUST BE RGB format, otherwise you will likely experience prediction accuracy problems.
This function can automatically decode output from YOLOv5 and YOLOv8 models.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-11-13

Implements TensorflowTPU< std::vector< std::vector< Detection > >, cv::Mat >.

280 {
281 // Create instance variables.
282 std::vector<std::vector<Detection>> vTensorObjectOutputs;
283
284 // Get the input tensor shape for the model.
285 InputTensorDimensions stInputDimensions = this->GetInputShape(m_pInterpreter->inputs()[0]);
286
287 // Copy given frame to class member variable.
288 m_cvFrame = cvInputFrame;
289
290 // Check if model is open and device is ready.
291 if (m_bDeviceOpened && m_pEdgeTPUContext->IsReady())
292 {
293 // Check if the image has the correct type.
294 if (m_cvFrame.type() != CV_8UC3)
295 {
296 // Convert image to unsigned int8 image.
297 m_cvFrame.convertTo(m_cvFrame, CV_8UC3);
298 }
299
300 // Check if the input image matches the input tensor shape.
301 if (m_cvFrame.rows != stInputDimensions.nHeight || m_cvFrame.cols != stInputDimensions.nWidth)
302 {
303 // Resize the image, and store a local copy of it.
304 cv::resize(m_cvFrame,
305 m_cvFrame,
306 cv::Size(stInputDimensions.nWidth, stInputDimensions.nHeight),
307 constants::BASICCAM_RESIZE_INTERPOLATION_METHOD);
308 }
309
310 // Create a vector to store reshaped input image in 1 dimension.
311 std::vector<int8_t> vInputData(m_cvFrame.data,
312 m_cvFrame.data + (static_cast<unsigned long>(m_cvFrame.cols) * m_cvFrame.rows * m_cvFrame.elemSize()));
313 // Quantize input data.
314 // for (long unsigned int nIter = 0; nIter < vInputData.size(); ++nIter)
315 // {
316 // // Quantize value.
317 // vInputData[nIter] = std::round((vInputData[nIter] - 128) / stInputDimensions.fQuantScale) + stInputDimensions.nQuantZeroPoint;
318 // // vInputData[nIter] = vInputData[nIter] - 128;
319 // }
320 // Retrieve a new input tensor from the TPU interpreter and copy data to it. This tensor is automatically quantized because it is typed.
321 TfLiteTensor* pInputTensor = m_pInterpreter->tensor(stInputDimensions.nTensorIndex);
322 std::memcpy(pInputTensor->data.raw, vInputData.data(), vInputData.size());
323
324 // Run inference on the EdgeTPU.
325 if (m_pInterpreter->Invoke() != kTfLiteOk)
326 {
327 // Submit logger message.
328 LOG_WARNING(logging::g_qSharedLogger,
329 "Inferencing failed on an image for model {} with device {} ({})",
330 m_szModelPath,
331 m_tpuDevice.path,
332 this->DeviceTypeToString(m_tpuDevice.type));
333 }
334 else
335 {
336 // Create separate vectors for storing class confidences, bounding boxes, and classIDs.
337 std::vector<int> vClassIDs;
338 std::vector<float> vClassConfidences;
339 std::vector<cv::Rect> vBoundingBoxes;
340 // Create vector for storing all detections for this tensor output.
341 std::vector<Detection> vObjects;
342
343 // Get output indices for output tensors.
344 for (int nTensorIndex : m_pInterpreter->outputs())
345 {
346 // Clear prediction data vectors.
347 vClassIDs.clear();
348 vClassConfidences.clear();
349 vBoundingBoxes.clear();
350 // Clear object detections vector.
351 vObjects.clear();
352
353 /*
354 Check if the output tensor has a YOLOv5 format.
355 */
356 // Get the tensor output shape details.
357 OutputTensorDimensions stOutputDimensions = this->GetOutputShape(nTensorIndex);
358 // Calculate the general stride sizes for YOLO based on input tensor shape.
359 int nImgSize = stInputDimensions.nHeight;
360 int nP3Stride = std::pow((nImgSize / 8), 2);
361 int nP4Stride = std::pow((nImgSize / 16), 2);
362 int nP5Stride = std::pow((nImgSize / 32), 2);
363 // Calculate the proper prediction length for different YOLO versions.
364 int nYOLOv5AnchorsPerGridPoint = 3;
365 int nYOLOv8AnchorsPerGridPoint = 1;
366 int nYOLOv5TotalPredictionLength =
367 (nP3Stride * nYOLOv5AnchorsPerGridPoint) + (nP4Stride * nYOLOv5AnchorsPerGridPoint) + (nP5Stride * nYOLOv5AnchorsPerGridPoint);
368 int nYOLOv8TotalPredictionLength =
369 (nP3Stride * nYOLOv8AnchorsPerGridPoint) + (nP4Stride * nYOLOv8AnchorsPerGridPoint) + (nP5Stride * nYOLOv8AnchorsPerGridPoint);
370
371 // Output tensor is YOLOv5 format.
372 if (stOutputDimensions.nAnchors == nYOLOv5TotalPredictionLength)
373 {
374 // Parse inferenced output from tensor.
375 this->ParseTensorOutputYOLOv5(nTensorIndex,
376 vClassIDs,
377 vClassConfidences,
378 vBoundingBoxes,
379 fMinObjectConfidence,
380 cvInputFrame.cols,
381 cvInputFrame.rows);
382 }
383 // Output tensor is YOLOv8 format.
384 else if (stOutputDimensions.nAnchors == nYOLOv8TotalPredictionLength)
385 {
386 // Parse inferenced output from tensor.
387 this->ParseTensorOutputYOLOv8(nTensorIndex,
388 vClassIDs,
389 vClassConfidences,
390 vBoundingBoxes,
391 fMinObjectConfidence,
392 cvInputFrame.cols,
393 cvInputFrame.rows);
394 }
395
396 // Perform NMS to filter out bad/duplicate detections.
397 NonMaxSuppression(vObjects, vClassIDs, vClassConfidences, vBoundingBoxes, fMinObjectConfidence, fNMSThreshold);
398
399 // Append object detections to the tensor outputs vector.
400 vTensorObjectOutputs.emplace_back(vObjects);
401 }
402 }
403 }
404 else
405 {
406 // Submit logger message.
407 LOG_WARNING(logging::g_qSharedLogger,
408 "Inferencing failed on an image for model {} with device {} ({})",
409 m_szModelPath,
410 m_tpuDevice.path,
411 this->DeviceTypeToString(m_tpuDevice.type));
412 }
413
414 return vTensorObjectOutputs;
415 }
uchar * data
size_t elemSize() const
void convertTo(OutputArray m, int rtype, double alpha=1, double beta=0) const
int type() const
OutputTensorDimensions GetOutputShape(const int nTensorIndex=0)
Get the output shape of the tensor at the given index. Requires the device to have been successfully ...
Definition YOLOModel.hpp:659
InputTensorDimensions GetInputShape(const int nTensorIndex=0)
Get the input shape of the tensor at the given index. Requires the device to have been successfully o...
Definition YOLOModel.hpp:622
void ParseTensorOutputYOLOv5(int nOutputIndex, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, int nOriginalFrameWidth, int nOriginalFrameHeight)
Given a TFLite output tensor from a YOLOv5 model, parse it's output into something more usable....
Definition YOLOModel.hpp:448
void ParseTensorOutputYOLOv8(int nOutputIndex, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, int nOriginalFrameWidth, int nOriginalFrameHeight)
Given a TFLite output tensor from a YOLOv8 model, parse it's output into something more usable....
Definition YOLOModel.hpp:544
void resize(InputArray src, OutputArray dst, Size dsize, double fx=0, double fy=0, int interpolation=INTER_LINEAR)
void NonMaxSuppression(std::vector< Detection > &vObjects, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, float fNMSThreshold)
Perform non max suppression for the given predictions. This eliminates/combines predictions that over...
Definition YOLOModel.hpp:71
Here is the call graph for this function:
Here is the caller graph for this function:

◆ ParseTensorOutputYOLOv5()

void yolomodel::tensorflow::TPUInterpreter::ParseTensorOutputYOLOv5 ( int  nOutputIndex,
std::vector< int > &  vClassIDs,
std::vector< float > &  vClassConfidences,
std::vector< cv::Rect > &  vBoundingBoxes,
float  fMinObjectConfidence,
int  nOriginalFrameWidth,
int  nOriginalFrameHeight 
)
inlineprivate

Given a TFLite output tensor from a YOLOv5 model, parse it's output into something more usable. The parsed output will be in the form of three vectors: one for class IDs, one for the prediction confidence for the class ID, and one for cv::Rects storing the bounding box data for the prediction. A prediction will line up between the three vectors. (vClassIDs[0], vClassConfidences[0], and vBoundingBoxes[0] correspond to the same prediction.)

Parameters
nOutputIndex- The output tensor index from the model containing inference data.
vClassIDs- A reference to a vector that will be filled with class IDs for each prediction. The class ID of a prediction will be choosen by the highest class confidence for that prediction.
vClassConfidences- A reference to a vector that will be filled with the highest class confidence for that prediction.
vBoundingBoxes- A reference to a vector that will be filled with cv::Rect bounding box for each prediction.
fMinObjectConfidence- The minimum confidence for determining which predictions to throw out.
nOriginalFrameWidth- The pixel width of the normal/original camera frame. This is not the size of the model input or resized image.
nOriginalFrameHeight- The pixel height of the normal/original camera frame. This is not the size of the model input or resized image.
Note
YOLOv5 predicts 25200 grid_cells when fed with a (3, 640, 640) image (Three detection layers for small, medium, and large objects same size as input with same bit depth). Each grid_cell is a vector composed by (5 + num_classes) values where the 5 values are [objectness_score, Xc, Yc, W, H]. Output would be [1, 25200, 13] for a model with eight classes and 640x640 input size.

Check out https://pub.towardsai.net/yolov5-m-implementation-from-scratch-with-pytorch-c8f84a66c98b for some great info.

Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-11-15
455 {
456 // Retrieve output tensor from interpreter.
457 TfLiteTensor* tfOutputTensor = m_pInterpreter->tensor(nOutputIndex);
458 // Get output tensor shape.
459 OutputTensorDimensions stOutputDimensions = this->GetOutputShape(nOutputIndex);
460 // Create vector for storing temporary values for this prediction.
461 std::vector<float> vGridPrediction;
462 // Resize the Grid prediction vector to match the number of classes + bounding_box + objectness score.
463 vGridPrediction.resize(stOutputDimensions.nObjectnessLocationClasses);
464
465 /*
466 Loop through each grid cell output of the model output and filter out objects that don't meet conf thresh.
467 Then, repackage into nice detection structs.
468 For YOLOv5, you divide your image size, i.e. 640 by the P3, P4, P5 output strides of 8, 16, 32 to arrive at grid sizes
469 of 80x80, 40x40, 20x20. Each grid point has 3 anchors by default (anchor box values: small, medium, large), and each anchor contains a vector 5 +
470 nc long, where nc is the number of classes the model has. So for a 640 image, the output tensor will be [1, 25200, 85]
471 */
472 for (int nIter = 0; nIter < stOutputDimensions.nAnchors; ++nIter)
473 {
474 // Get objectness confidence. This is the 5th value for each grid/anchor prediction. (4th index)
475 float fObjectnessConfidence =
476 (tfOutputTensor->data.uint8[(nIter * stOutputDimensions.nObjectnessLocationClasses) + 4] - stOutputDimensions.nQuantZeroPoint) *
477 stOutputDimensions.fQuantScale;
478
479 // Check if the object confidence is greater than or equal to the threshold.
480 if (fObjectnessConfidence >= fMinObjectConfidence)
481 {
482 // Loop through the number of object info and class confidences in the 2nd dimension.
483 // Predictions have format {center_x, center_y, width, height, object_conf, class0_conf, class1_conf, ...}
484 for (int nJter = 0; nJter < stOutputDimensions.nObjectnessLocationClasses; ++nJter)
485 {
486 // Repackage value into more usable vector. Also undo quantization the data.
487 vGridPrediction[nJter] =
488 (tfOutputTensor->data.uint8[(nIter * stOutputDimensions.nObjectnessLocationClasses) + nJter] - stOutputDimensions.nQuantZeroPoint) *
489 stOutputDimensions.fQuantScale;
490 }
491
492 // Find class ID based on which class confidence has the highest score.
493 std::vector<float>::iterator pStartIterator = vGridPrediction.begin() + 5;
494 std::vector<float>::iterator pMaxConfidence = std::max_element(pStartIterator, vGridPrediction.end());
495 int nClassID = std::distance(pStartIterator, pMaxConfidence);
496 // Get prediction confidence for class ID.
497 float fClassConfidence = vGridPrediction[nClassID + 5];
498 // Scale bounding box to match original input image size.
499 cv::Rect cvBoundingBox;
500 int nCenterX = vGridPrediction[0] * nOriginalFrameWidth;
501 int nCenterY = vGridPrediction[1] * nOriginalFrameHeight;
502 int nWidth = vGridPrediction[2] * nOriginalFrameWidth;
503 int nHeight = vGridPrediction[3] * nOriginalFrameHeight;
504 // Check if the width and height of the object are greater than zero.
505 if (nWidth > 0 && nHeight > 0)
506 {
507 // Repackaged bounding box data to be more readable.
508 cvBoundingBox.x = int(nCenterX - (0.5 * nWidth)); // Rect.x is the top-left corner not center point.
509 cvBoundingBox.y = int(nCenterY - (0.5 * nHeight)); // Rect.y is the top-left corner not center point.
510 cvBoundingBox.width = nWidth;
511 cvBoundingBox.height = nHeight;
512 // Add data to vectors.
513 vClassIDs.emplace_back(nClassID);
514 vClassConfidences.emplace_back(fClassConfidence);
515 vBoundingBoxes.emplace_back(cvBoundingBox);
516 }
517 }
518 }
519 }
Here is the call graph for this function:
Here is the caller graph for this function:

◆ ParseTensorOutputYOLOv8()

void yolomodel::tensorflow::TPUInterpreter::ParseTensorOutputYOLOv8 ( int  nOutputIndex,
std::vector< int > &  vClassIDs,
std::vector< float > &  vClassConfidences,
std::vector< cv::Rect > &  vBoundingBoxes,
float  fMinObjectConfidence,
int  nOriginalFrameWidth,
int  nOriginalFrameHeight 
)
inlineprivate

Given a TFLite output tensor from a YOLOv8 model, parse it's output into something more usable. The parsed output will be in the form of three vectors: one for class IDs, one for the prediction confidence for the class ID, and one for cv::Rects storing the bounding box data for the prediction. A prediction will line up between the three vectors. (vClassIDs[0], vClassConfidences[0], and vBoundingBoxes[0] correspond to the same prediction.)

Parameters
nOutputIndex- The output tensor index from the model containing inference data.
vClassIDs- A reference to a vector that will be filled with class IDs for each prediction. The class ID of a prediction will be choosen by the highest class confidence for that prediction.
vClassConfidences- A reference to a vector that will be filled with the highest class confidence for that prediction.
vBoundingBoxes- A reference to a vector that will be filled with cv::Rect bounding box for each prediction.
fMinObjectConfidence- The minimum confidence for determining which predictions to throw out.
nOriginalFrameWidth- The pixel width of the normal/original camera frame. This is not the size of the model input or resized image.
nOriginalFrameHeight- The pixel height of the normal/original camera frame. This is not the size of the model input or resized image.
Note
For YOLOv8, you divide your image size, i.e. 640 by the P3, P4, P5 output strides of 8, 16, 32 to arrive at grid sizes of 80x80, 40x40, 20x20. Each grid point has 1 anchor, and each anchor contains a vector 4 + nc long, where nc is the number of classes the model has. So for a 640 image, the output tensor will be [1, 84, 8400] (80 classes). Notice how the larger dimensions is swapped when compared to YOLOv8.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-11-15
551 {
552 // Retrieve output tensor from interpreter.
553 TfLiteTensor* tfOutputTensor = m_pInterpreter->tensor(nOutputIndex);
554 // Get output tensor shape.
555 OutputTensorDimensions stOutputDimensions = this->GetOutputShape(nOutputIndex);
556 // Create vector for storing temporary values for this prediction.
557 std::vector<float> vGridPrediction;
558 // Resize the Grid prediction vector to match the number of classes + bounding_box + objectness score.
559 vGridPrediction.resize(stOutputDimensions.nObjectnessLocationClasses);
560
561 /*
562 Loop through each grid cell output of the model output and filter out objects that don't meet conf thresh.
563 Then, repackage into nice detection structs.
564 For YOLOv8, you divide your image size, i.e. 640 by the P3, P4, P5 output strides of 8, 16, 32 to arrive at grid sizes
565 of 80x80, 40x40, 20x20. Each grid point has 1 anchor, and each anchor contains a vector 4 + nc long, where nc is the number
566 of classes the model has. So for a 640 image, the output tensor will be [1, 84, 8400] (80 classes). Notice how the larger dimensions is swapped
567 when compared to YOLOv8.
568 */
569 for (int nIter = 0; nIter < stOutputDimensions.nAnchors; ++nIter)
570 {
571 // Loop through the number of object info and class confidences in the 2nd dimension.
572 // Predictions have format {center_x, center_y, width, height, class0_conf, class1_conf, ...}
573 std::string szTest = "";
574 for (int nJter = 0; nJter < stOutputDimensions.nObjectnessLocationClasses; ++nJter)
575 {
576 // Repackage values into more usable vector. Also undo quantization the data.
577 vGridPrediction[nJter] = (tfOutputTensor->data.int8[nIter + (nJter * stOutputDimensions.nAnchors)] - stOutputDimensions.nQuantZeroPoint) *
578 stOutputDimensions.fQuantScale;
579 }
580
581 // Find class ID based on which class confidence has the highest score.
582 std::vector<float>::iterator pStartIterator = vGridPrediction.begin() + 4;
583 std::vector<float>::iterator pMaxConfidence = std::max_element(pStartIterator, vGridPrediction.end());
584 int nClassID = std::distance(pStartIterator, pMaxConfidence);
585 // Get prediction confidence for class ID.
586 float fClassConfidence = vGridPrediction[nClassID + 4];
587
588 // Check if class confidence meets threshold.
589 if (fClassConfidence >= fMinObjectConfidence)
590 {
591 // Scale bounding box to match original input image size.
592 cv::Rect cvBoundingBox;
593 int nCenterX = vGridPrediction[0] * nOriginalFrameWidth;
594 int nCenterY = vGridPrediction[1] * nOriginalFrameHeight;
595 int nWidth = vGridPrediction[2] * nOriginalFrameWidth;
596 int nHeight = vGridPrediction[3] * nOriginalFrameHeight;
597 // Repackaged bounding box data to be more readable.
598 cvBoundingBox.x = int(nCenterX - (0.5 * nWidth)); // Rect.x is the top-left corner not center point.
599 cvBoundingBox.y = int(nCenterY - (0.5 * nHeight)); // Rect.y is the top-left corner not center point.
600 cvBoundingBox.width = nWidth;
601 cvBoundingBox.height = nHeight;
602 // Add data to vectors.
603 vClassIDs.emplace_back(nClassID);
604 vClassConfidences.emplace_back(fClassConfidence);
605 vBoundingBoxes.emplace_back(cvBoundingBox);
606 }
607 }
608 }
Here is the call graph for this function:
Here is the caller graph for this function:

◆ GetInputShape()

InputTensorDimensions yolomodel::tensorflow::TPUInterpreter::GetInputShape ( const int  nTensorIndex = 0)
inlineprivate

Get the input shape of the tensor at the given index. Requires the device to have been successfully opened.

Parameters
nTensorIndex- The index of the input tensor to use. YOLO models that have been converted to a edgetpu quantized .tflite file will only have one input at index 0.
Returns
TensorDimensions - A struct containing the height, width, and channels of the input tensor.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-11-12
623 {
624 // Create instance variables.
625 InputTensorDimensions stInputDimensions = {0, 0, 0, 0, 0, 0};
626
627 // Check if interpreter has been built.
628 if (m_bDeviceOpened)
629 {
630 // Get the desired input tensor shape of the model.
631 TfLiteTensor* tfInputTensor = m_pInterpreter->tensor(nTensorIndex);
632 TfLiteIntArray* tfDimensions = tfInputTensor->dims;
633
634 // Package dimensions into struct.
635 stInputDimensions.nHeight = tfDimensions->data[1];
636 stInputDimensions.nWidth = tfDimensions->data[2];
637 stInputDimensions.nChannels = tfDimensions->data[3];
638 stInputDimensions.nTensorIndex = nTensorIndex;
639 // Get the quantization zero point and scale for output tensor.
640 stInputDimensions.nQuantZeroPoint = tfInputTensor->params.zero_point;
641 stInputDimensions.fQuantScale = tfInputTensor->params.scale;
642 }
643
644 return stInputDimensions;
645 }
Here is the caller graph for this function:

◆ GetOutputShape()

OutputTensorDimensions yolomodel::tensorflow::TPUInterpreter::GetOutputShape ( const int  nTensorIndex = 0)
inlineprivate

Get the output shape of the tensor at the given index. Requires the device to have been successfully opened.

Parameters
nTensorIndex- The index of the output tensor to use. YOLO models that have been converted to a edgetpu quantized .tflite file will only have one output at index 0.
Returns
TensorDimensions - A struct containing the height, width, and channels of the output tensor.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2023-11-12
660 {
661 // Create instance variables.
662 OutputTensorDimensions stOutputDimensions = {0, 0, 0, 0, 0};
663
664 // Check if interpreter has been built.
665 if (m_bDeviceOpened)
666 {
667 // Get the desired output tensor shape of the model.
668 TfLiteTensor* tfOutputTensor = m_pInterpreter->tensor(nTensorIndex);
669 TfLiteIntArray* tfDimensions = tfOutputTensor->dims;
670
671 // Package dimensions into struct. Assume anchors will always be the longer dimension.
672 stOutputDimensions.nAnchors = std::max(tfDimensions->data[1], tfDimensions->data[2]);
673 stOutputDimensions.nObjectnessLocationClasses = std::min(tfDimensions->data[1], tfDimensions->data[2]);
674 stOutputDimensions.nTensorIndex = nTensorIndex;
675 // Get the quantization zero point and scale for output tensor.
676 stOutputDimensions.nQuantZeroPoint = tfOutputTensor->params.zero_point;
677 stOutputDimensions.fQuantScale = tfOutputTensor->params.scale;
678 }
679
680 return stOutputDimensions;
681 }
Here is the caller graph for this function:

The documentation for this class was generated from the following file: