Autonomy Software C++ 24.5.1
Welcome to the Autonomy Software repository of the Mars Rover Design Team (MRDT) at Missouri University of Science and Technology (Missouri S&T)! API reference contains the source code and other resources for the development of the autonomy software for our Mars rover. The Autonomy Software project aims to compete in the University Rover Challenge (URC) by demonstrating advanced autonomous capabilities and robust navigation algorithms.
Loading...
Searching...
No Matches
yolomodel::pytorch::PyTorchInterpreter Class Reference

This class is designed to enable quick, easy, and robust inferencing of .pt yolo model. More...

Public Types

enum class  HardwareDevices { eCPU , eCUDA }
 

Public Member Functions

 PyTorchInterpreter (std::string szModelPath, HardwareDevices eHardwareDevice=HardwareDevices::eCUDA)
 Construct a new PyTorchInterpreter object.
 
 ~PyTorchInterpreter ()
 Destroy the PyTorchInterpreter object.
 
std::vector< DetectionInference (const cv::Mat &cvInputFrame, const float fMinObjectConfidence=0.85, const float fNMSThreshold=0.6)
 Given an input image forward the image through the YOLO model to run inference on the PyTorch model, then parse and repackage the output tensor data into a vector of easy-to-use Detection structs.
 
bool IsReadyForInference () const
 Check if the model is ready for inference.
 

Private Member Functions

torch::Tensor PreprocessImage (const cv::Mat &cvInputFrame, const torch::Device &trDevice)
 Given an input image, preprocess the image to match the input tensor shape of the model, then return the preprocessed image as a tensor.
 
void ParseTensorOutputYOLOv5 (const torch::Tensor &trOutput, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, const cv::Size &cvInputFrameSize, const float fMinObjectConfidence)
 Given a tensor output from a YOLOv5 model, parse it's output into something more usable.
 
void ParseTensorOutputYOLOv8 (const torch::Tensor &trOutput, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, const cv::Size &cvInputFrameSize, const float fMinObjectConfidence)
 Given a tensor output from a YOLOv5 model, parse it's output into something more usable.
 

Private Attributes

torch::jit::script::Module m_trModel
 
torch::Device m_trDevice = torch::kCPU
 
std::string m_szModelPath
 
bool m_bReady
 
std::string m_szModelTask
 
cv::Size m_cvModelInputSize
 
std::vector< std::string > m_vClassLabels
 

Detailed Description

This class is designed to enable quick, easy, and robust inferencing of .pt yolo model.

Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2025-01-06

Member Enumeration Documentation

◆ HardwareDevices

enum class yolomodel::pytorch::PyTorchInterpreter::HardwareDevices
strong
716 {
717 eCPU, // The CPU device.
718 eCUDA // The CUDA device.
719 };

Constructor & Destructor Documentation

◆ PyTorchInterpreter()

yolomodel::pytorch::PyTorchInterpreter::PyTorchInterpreter ( std::string  szModelPath,
HardwareDevices  eHardwareDevice = HardwareDevices::eCUDA 
)
inline

Construct a new PyTorchInterpreter object.

Parameters
szModelPath- The path to the model to open and inference.
trDevice- The device to run the model on. Default is CUDA. Other options are CPU and MKLDNN.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2025-01-06
735 {
736 // Initialize member variables.
737 m_szModelPath = szModelPath;
738 m_bReady = false;
739 m_cvModelInputSize = cv::Size(640, 640);
740 m_szModelTask = "Unknown";
741 m_vClassLabels = std::vector<std::string>();
742
743 // Translate the hardware device enum to a torch device.
744 switch (eHardwareDevice)
745 {
746 case HardwareDevices::eCPU: m_trDevice = torch::kCPU; break;
747 case HardwareDevices::eCUDA: m_trDevice = torch::kCUDA; break;
748 default: m_trDevice = torch::kCPU; break;
749 }
750
751 // Submit logger message.
752 LOG_INFO(logging::g_qSharedLogger, "Attempting to load model {} onto device {}", szModelPath, m_trDevice.str());
753
754 // Check if the model path is valid.
755 if (!std::filesystem::exists(szModelPath))
756 {
757 // Submit logger message.
758 LOG_ERROR(logging::g_qSharedLogger, "Model path {} does not exist!", szModelPath);
759 return;
760 }
761 // Check if the device is available.
762 if (!torch::cuda::is_available() && m_trDevice == torch::kCUDA)
763 {
764 // Submit logger message.
765 LOG_ERROR(logging::g_qSharedLogger, "CUDA device is not available, falling back to CPU.");
766 m_trDevice = torch::kCPU;
767 return;
768 }
769 else
770 {
771 // Submit logger message.
772 LOG_INFO(logging::g_qSharedLogger, "Using device: {}", m_trDevice.str());
773 }
774
775 // Finally, attempt to load the model.
776 try
777 {
778 // Load the model and set it to eval mode for inference.
779 torch::jit::ExtraFilesMap trExtraConfigFiles{{"config.txt", ""}};
780 m_trModel = torch::jit::load(szModelPath, m_trDevice, trExtraConfigFiles);
781 m_trModel.eval();
782
783 // Use nlohmann json to parse the config file.
784 nlohmann::json jConfig = nlohmann::json::parse(trExtraConfigFiles.at("config.txt"));
785 // Get the input image size for the model.
786 m_cvModelInputSize = cv::Size(jConfig["imgsz"][0], jConfig["imgsz"][1]);
787 m_szModelTask = jConfig["task"];
788 for (const auto& item : jConfig["names"].items())
789 {
790 m_vClassLabels.push_back(item.value());
791 }
792 // Submit the config json as a debug message.
793 LOG_DEBUG(logging::g_qSharedLogger, "Model config: {}", jConfig.dump(4));
794
795 // Check if the model is empty.
796 if (m_trModel.get_methods().empty())
797 {
798 LOG_ERROR(logging::g_qSharedLogger, "Model is empty! Check if the correct model file was provided.");
799 return;
800 }
801 // Check if the model did not move to the expected device.
802 if (m_trModel.buffers().size() > 0)
803 {
804 // Get the device of the model.
805 torch::Device model_device = m_trModel.buffers().begin().operator->().device();
806 if (model_device != m_trDevice)
807 {
808 LOG_ERROR(logging::g_qSharedLogger, "Model did not move to the expected device! Model is on: {}", model_device.str());
809 return;
810 }
811 }
812 else
813 {
814 LOG_WARNING(logging::g_qSharedLogger, "Model has no buffers to check the device.");
815 }
816
817 // Model is ready for inference.
818 LOG_INFO(logging::g_qSharedLogger,
819 "Model successfully loaded and set to eval mode. The model is a {} model, and has {} classes.",
820 m_szModelTask,
821 m_vClassLabels.size());
822
823 // Set flag saying we are ready for inference.
824 m_bReady = true;
825 }
826 catch (const c10::Error& trError)
827 {
828 LOG_ERROR(logging::g_qSharedLogger, "Error loading model: {}", trError.what());
829 }
830 }
Size2i Size

◆ ~PyTorchInterpreter()

yolomodel::pytorch::PyTorchInterpreter::~PyTorchInterpreter ( )
inline

Destroy the PyTorchInterpreter object.

Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2025-01-06
840 {
841 // Nothing to destroy.
842 }

Member Function Documentation

◆ Inference()

std::vector< Detection > yolomodel::pytorch::PyTorchInterpreter::Inference ( const cv::Mat cvInputFrame,
const float  fMinObjectConfidence = 0.85,
const float  fNMSThreshold = 0.6 
)
inline

Given an input image forward the image through the YOLO model to run inference on the PyTorch model, then parse and repackage the output tensor data into a vector of easy-to-use Detection structs.

Parameters
cvInputFrame- The RGB camera frame to run detection on.
fMinObjectConfidence- Minimum confidence required for an object to be considered a valid detection
fNMSThreshold- Threshold for Non-Maximum Suppression, controlling overlap between bounding box predictions.
Returns
std::vector<Detection> - A vector of structs containing information about the valid object detections in the given image.
Note
The input image MUST BE RGB format, otherwise you will likely experience prediction accuracy problems.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2025-01-06
860 {
861 // Force single-threaded execution (if acceptable for your workload)
862 torch::set_num_threads(1);
863 // Create instance variables.
864 std::vector<Detection> vObjects;
865
866 // Preprocess the given image and pack int into an image.
867 torch::Tensor trTensorImage = PreprocessImage(cvInputFrame, m_trDevice);
868
869 // Perform inference.
870 std::vector<torch::jit::IValue> vInputs;
871 vInputs.push_back(trTensorImage);
872 torch::Tensor trOutputTensor;
873 try
874 {
875 trOutputTensor = m_trModel.forward(vInputs).toTensor();
876 }
877 catch (const c10::Error& trError)
878 {
879 LOG_ERROR(logging::g_qSharedLogger, "Error running inference: {}", trError.what());
880 return vObjects;
881 }
882
883 // Calculate the general stride sizes for YOLO based on input tensor shape.
884 int nImgSize = m_cvModelInputSize.height;
885 int nP3Stride = std::pow((nImgSize / 8), 2);
886 int nP4Stride = std::pow((nImgSize / 16), 2);
887 int nP5Stride = std::pow((nImgSize / 32), 2);
888 // Calculate the proper prediction length for different YOLO versions.
889 int nYOLOv5AnchorsPerGridPoint = 3;
890 int nYOLOv8AnchorsPerGridPoint = 1;
891 int nYOLOv5TotalPredictionLength =
892 (nP3Stride * nYOLOv5AnchorsPerGridPoint) + (nP4Stride * nYOLOv5AnchorsPerGridPoint) + (nP5Stride * nYOLOv5AnchorsPerGridPoint);
893 int nYOLOv8TotalPredictionLength =
894 (nP3Stride * nYOLOv8AnchorsPerGridPoint) + (nP4Stride * nYOLOv8AnchorsPerGridPoint) + (nP5Stride * nYOLOv8AnchorsPerGridPoint);
895
896 // Parse the output tensor.
897 std::vector<int> vClassIDs;
898 std::vector<std::string> vClassLabels;
899 std::vector<float> vClassConfidences;
900 std::vector<cv::Rect> vBoundingBoxes;
901
902 // Get the largest dimension of our output tensor.
903 int nLargestDimension = *std::max_element(trOutputTensor.sizes().begin(), trOutputTensor.sizes().end());
904 // Check if the output tensor is YOLOv5 format.
905 if (nLargestDimension == nYOLOv5TotalPredictionLength)
906 {
907 // Parse inferenced output from tensor.
908 this->ParseTensorOutputYOLOv5(trOutputTensor, vClassIDs, vClassConfidences, vBoundingBoxes, cvInputFrame.size(), fMinObjectConfidence);
909 }
910 // Check if the output tensor is YOLOv8 format.
911 else if (nLargestDimension == nYOLOv8TotalPredictionLength)
912 {
913 // Parse inferenced output from tensor.
914 this->ParseTensorOutputYOLOv8(trOutputTensor, vClassIDs, vClassConfidences, vBoundingBoxes, cvInputFrame.size(), fMinObjectConfidence);
915 }
916
917 // Perform NMS to filter out bad/duplicate detections.
918 NonMaxSuppression(vObjects, vClassIDs, vClassConfidences, vBoundingBoxes, fMinObjectConfidence, fNMSThreshold);
919
920 // Loop through the final detections and set the class names for each detection based on the class ID.
921 for (size_t nIter = 0; nIter < vObjects.size(); ++nIter)
922 {
923 // Check if the class ID is valid.
924 if (vClassIDs[nIter] >= 0 && vClassIDs[nIter] < static_cast<int>(m_vClassLabels.size()))
925 {
926 vObjects[nIter].szClassName = m_vClassLabels[vClassIDs[nIter]];
927 }
928 else
929 {
930 vObjects[nIter].szClassName = "UnknownClass";
931 }
932 }
933
934 return vObjects;
935 }
MatSize size
void ParseTensorOutputYOLOv8(const torch::Tensor &trOutput, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, const cv::Size &cvInputFrameSize, const float fMinObjectConfidence)
Given a tensor output from a YOLOv5 model, parse it's output into something more usable.
Definition YOLOModel.hpp:1101
void ParseTensorOutputYOLOv5(const torch::Tensor &trOutput, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, const cv::Size &cvInputFrameSize, const float fMinObjectConfidence)
Given a tensor output from a YOLOv5 model, parse it's output into something more usable.
Definition YOLOModel.hpp:993
torch::Tensor PreprocessImage(const cv::Mat &cvInputFrame, const torch::Device &trDevice)
Given an input image, preprocess the image to match the input tensor shape of the model,...
Definition YOLOModel.hpp:964
void NonMaxSuppression(std::vector< Detection > &vObjects, std::vector< int > &vClassIDs, std::vector< float > &vClassConfidences, std::vector< cv::Rect > &vBoundingBoxes, float fMinObjectConfidence, float fNMSThreshold)
Perform non max suppression for the given predictions. This eliminates/combines predictions that over...
Definition YOLOModel.hpp:71
Here is the call graph for this function:
Here is the caller graph for this function:

◆ IsReadyForInference()

bool yolomodel::pytorch::PyTorchInterpreter::IsReadyForInference ( ) const
inline

Check if the model is ready for inference.

Returns
true - Model is ready for inference.
false - Model is not ready for inference.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2025-02-13
946{ return m_bReady; }
Here is the caller graph for this function:

◆ PreprocessImage()

torch::Tensor yolomodel::pytorch::PyTorchInterpreter::PreprocessImage ( const cv::Mat cvInputFrame,
const torch::Device &  trDevice 
)
inlineprivate

Given an input image, preprocess the image to match the input tensor shape of the model, then return the preprocessed image as a tensor.

Parameters
cvInputFrame- The input image to preprocess.
trDevice- The device to run the model on.
Returns
torch::Tensor - The preprocessed image as a tensor.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2025-03-08
965 {
966 // Resize the input image to match model and normalize it to 0-1.
967 cv::Mat cvResizedImage;
968 cv::resize(cvInputFrame, cvResizedImage, cv::Size(m_cvModelInputSize.width, m_cvModelInputSize.height), cv::INTER_LINEAR);
969 cvResizedImage.convertTo(cvResizedImage, CV_32FC3, 1.0 / 255.0);
970
971 // Convert OpenCV mat to a tensor.
972 torch::Tensor trTensorImage = torch::from_blob(cvResizedImage.data, {1, cvResizedImage.rows, cvResizedImage.cols, 3}, torch::kFloat);
973 trTensorImage = trTensorImage.permute({0, 3, 1, 2}); // Convert to CxHxW format.
974 trTensorImage = trTensorImage.to(trDevice); // Move tensor to the specified hardware device.
975
976 return trTensorImage;
977 }
uchar * data
void convertTo(OutputArray m, int rtype, double alpha=1, double beta=0) const
void resize(InputArray src, OutputArray dst, Size dsize, double fx=0, double fy=0, int interpolation=INTER_LINEAR)
INTER_LINEAR
Here is the call graph for this function:
Here is the caller graph for this function:

◆ ParseTensorOutputYOLOv5()

void yolomodel::pytorch::PyTorchInterpreter::ParseTensorOutputYOLOv5 ( const torch::Tensor &  trOutput,
std::vector< int > &  vClassIDs,
std::vector< float > &  vClassConfidences,
std::vector< cv::Rect > &  vBoundingBoxes,
const cv::Size cvInputFrameSize,
const float  fMinObjectConfidence 
)
inlineprivate

Given a tensor output from a YOLOv5 model, parse it's output into something more usable.

Parameters
trOutput- A reference to the output tensor from the model. The tensor should be of shape [1, 25200, 85] for YOLOv5.
vClassIDs- A reference to a vector that will be filled with class IDs for each prediction. The class ID of a prediction will be chosen
vClassConfidences- A reference to a vector that will be filled with the highest class confidence for
vBoundingBoxes- A reference to a vector that will be filled with cv::Rect bounding box for each prediction.
cvInputFrameSize- The size of the original input frame. This is used to scale the bounding boxes back to the original image size.
fMinObjectConfidence- The minimum confidence for determining which predictions to keep. Predictions with a confidence below this value will be discarded.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2025-03-13
999 {
1000 /*
1001 * For YOLOv5, you divide your image size, i.e. 640 by the P3, P4, P5 output strides of 8, 16, 32 to arrive at grid sizes
1002 * of 80x80, 40x40, 20x20. Each grid point has 3 anchors by default (anchor box values: small, medium, large), and each anchor contains a vector 5 +
1003 * nc long, where nc is the number of classes the model has. So for a 640 image, the output tensor will be [1, 25200, 85]
1004 */
1005 // Squeeze the batch dimension from the output tensor.
1006 torch::Tensor trSqueezedOutput = trOutput.squeeze(0);
1007
1008 // Move the tensor to CPU if necessary. If we're using GPU and we don't move the tensor to CPU, we will get an error and it will be slow.
1009 if (trSqueezedOutput.device().is_cuda())
1010 {
1011 trSqueezedOutput = trSqueezedOutput.to(torch::kCPU);
1012 }
1013 // Convert tensor to float if necessary.
1014 if (trSqueezedOutput.scalar_type() != torch::kFloat32)
1015 {
1016 trSqueezedOutput = trSqueezedOutput.to(torch::kFloat32);
1017 }
1018 // Ensure tensor is contiguous in memory.
1019 if (!trSqueezedOutput.is_contiguous())
1020 {
1021 trSqueezedOutput = trSqueezedOutput.contiguous();
1022 }
1023
1024 // Create an accessor for fast element-wise access.
1025 at::TensorAccessor trAccessor = trSqueezedOutput.accessor<float, 2>();
1026 const int nNumDetections = trSqueezedOutput.size(0);
1027 const int nTotalValues = trSqueezedOutput.size(1); // equals 5 + number_of_classes
1028
1029 // Loop through each detection.
1030 for (int i = 0; i < nNumDetections; i++)
1031 {
1032 // Get the objectness confidence. This is the 5th value for each grid/anchor prediction. (4th index)
1033 float fObjectnessConfidence = trAccessor[i][4];
1034
1035 // Check if the object confidence is greater than or equal to the threshold.
1036 if (fObjectnessConfidence < fMinObjectConfidence)
1037 {
1038 continue;
1039 }
1040
1041 // Retrieve bounding box data.
1042 float fCenterX = trAccessor[i][0];
1043 float fCenterY = trAccessor[i][1];
1044 float fWidth = trAccessor[i][2];
1045 float fHeight = trAccessor[i][3];
1046
1047 // Scale bounding box to original image size.
1048 int nLeft = static_cast<int>((fCenterX - (0.5 * fWidth)) * cvInputFrameSize.width);
1049 int nTop = static_cast<int>((fCenterY - (0.5 * fHeight)) * cvInputFrameSize.height);
1050 int nBoundingWidth = static_cast<int>(fWidth * cvInputFrameSize.width);
1051 int nBoundingHeight = static_cast<int>(fHeight * cvInputFrameSize.height);
1052
1053 // Repackaged bounding box data to be more readable.
1054 cv::Rect cvBoundingBox(nLeft, nTop, nBoundingWidth, nBoundingHeight);
1055
1056 // Loop over class confidence values and find the class ID with the highest confidence.
1057 float fClassConfidence = -1.0f;
1058 int nClassID = -1;
1059 for (int j = 5; j < nTotalValues; j++)
1060 {
1061 float fConfidence = trAccessor[i][j];
1062 if (fConfidence > fClassConfidence)
1063 {
1064 fClassConfidence = fConfidence;
1065 nClassID = j - 5;
1066 }
1067 }
1068
1069 // Only process detections that meet the minimum confidence.
1070 if (fClassConfidence < fMinObjectConfidence)
1071 {
1072 continue;
1073 }
1074
1075 // Add data to vectors.
1076 vClassIDs.emplace_back(nClassID);
1077 vClassConfidences.emplace_back(fClassConfidence);
1078 vBoundingBoxes.emplace_back(cvBoundingBox);
1079 }
1080 }
Here is the caller graph for this function:

◆ ParseTensorOutputYOLOv8()

void yolomodel::pytorch::PyTorchInterpreter::ParseTensorOutputYOLOv8 ( const torch::Tensor &  trOutput,
std::vector< int > &  vClassIDs,
std::vector< float > &  vClassConfidences,
std::vector< cv::Rect > &  vBoundingBoxes,
const cv::Size cvInputFrameSize,
const float  fMinObjectConfidence 
)
inlineprivate

Given a tensor output from a YOLOv5 model, parse it's output into something more usable.

Parameters
trOutput- A reference to the output tensor from the model.
vClassIDs- A reference to a vector that will be filled with class IDs for each prediction. The class ID of a prediction will be choosen by the highest class confidence for that prediction.
vClassConfidences- A reference to a vector that will be filled with the highest class confidence for that prediction.
vBoundingBoxes- A reference to a vector that will be filled with cv::Rect bounding box for each prediction.
cvInputFrameSize- The size of the original input frame before resizing. This is used to scale the bounding box back to the original size.
fMinObjectConfidence- The minimum confidence required for an object to be considered a valid detection.
Note
For YOLOv8, you divide your image size, i.e. 640 by the P3, P4, P5 output strides of 8, 16, 32 to arrive at grid sizes of 80x80, 40x40, 20x20. Each grid point has 1 anchor, and each anchor contains a vector 4 + nc long, where nc is the number of classes the model has. So for a 640 image, the output tensor will be [1, 84, 8400] (80 classes). Notice how the larger dimensions is swapped when compared to YOLOv8.
Author
clayjay3 (clayt.nosp@m.onra.nosp@m.ycowe.nosp@m.n@gm.nosp@m.ail.c.nosp@m.om)
Date
2025-03-08
1107 {
1108 /*
1109 * Permute the output tensor shape to match the expected format of the model. If the model is YOLOv8, the output
1110 * shape for a 640x640 image will be [1, 4 + nc, 8400] (nc = number of classes). Notice how the larger dimensions is swapped
1111 * when compared to YOLOv5. We will permute the tensor to [1, 8400, 4 + nc] to make it easier to parse. Then squeeze the
1112 * tensor to remove the batch dimension so the final shape will be [8400, 4 + nc]. Thanks pytorch for being cool with the
1113 * permute function.
1114 */
1115 // Permute the tensor shape from [1, 4 + nc, 8400] to [1, 8400, 4 + nc]
1116 // and then squeeze to remove the batch dimension, resulting in [8400, 4 + nc]
1117 torch::Tensor trPermuteOutput = trOutput.permute({0, 2, 1}).squeeze(0);
1118
1119 // Move tensor to CPU if necessary. If we're using GPU and we don't move the tensor to CPU, we will get an error and it will be slow.
1120 if (trPermuteOutput.device().is_cuda())
1121 {
1122 trPermuteOutput = trPermuteOutput.to(torch::kCPU);
1123 }
1124 // Convert tensor to float if necessary.
1125 if (trPermuteOutput.scalar_type() != torch::kFloat32)
1126 {
1127 trPermuteOutput = trPermuteOutput.to(torch::kFloat32);
1128 }
1129 // Ensure tensor is contiguous in memory.
1130 if (!trPermuteOutput.is_contiguous())
1131 {
1132 trPermuteOutput = trPermuteOutput.contiguous();
1133 }
1134
1135 // Create an accessor for fast element-wise access.
1136 at::TensorAccessor trAccessor = trPermuteOutput.accessor<float, 2>();
1137 const int nNumDetections = trPermuteOutput.size(0);
1138 const int nTotalValues = trPermuteOutput.size(1); // equals 4 + number_of_classes
1139
1140 // Loop through each detection.
1141 for (int i = 0; i < nNumDetections; i++)
1142 {
1143 float fClassConfidence = -1.0f;
1144 int nClassID = -1;
1145
1146 // Loop over class confidence values.
1147 for (int j = 4; j < nTotalValues; j++)
1148 {
1149 float fConfidence = trAccessor[i][j];
1150 if (fConfidence > fClassConfidence)
1151 {
1152 fClassConfidence = fConfidence;
1153 nClassID = j - 4;
1154 }
1155 }
1156
1157 // Only process detections that meet the minimum confidence.
1158 if (fClassConfidence < fMinObjectConfidence)
1159 {
1160 continue;
1161 }
1162
1163 // Retrieve bounding box data.
1164 float fCenterX = trAccessor[i][0];
1165 float fCenterY = trAccessor[i][1];
1166 float fWidth = trAccessor[i][2];
1167 float fHeight = trAccessor[i][3];
1168
1169 // Scale bounding box to original image size.
1170 int nLeft = static_cast<int>(fCenterX * cvInputFrameSize.width / 640.0f - (0.5f * fWidth * cvInputFrameSize.width / 640.0f));
1171 int nTop = static_cast<int>(fCenterY * cvInputFrameSize.height / 640.0f - (0.5f * fHeight * cvInputFrameSize.height / 640.0f));
1172 int nBoxWidth = static_cast<int>(fWidth * cvInputFrameSize.width / 640.0f);
1173 int nBoxHeight = static_cast<int>(fHeight * cvInputFrameSize.height / 640.0f);
1174 cv::Rect cvBoundingBox(nLeft, nTop, nBoxWidth, nBoxHeight);
1175
1176 // Append results.
1177 vClassIDs.push_back(nClassID);
1178 vClassConfidences.push_back(fClassConfidence);
1179 vBoundingBoxes.push_back(cvBoundingBox);
1180 }
1181 }
Here is the caller graph for this function:

The documentation for this class was generated from the following file: