embedl_hub.core.profile package#
Profiler components and result types.
Re-exports#
ProfileError— Raised on profiling failure.ProfilingMethod— Enum of profiling methods.ONNXRuntimeProfiler— ONNX Runtime profiler component.ONNXRuntimeProfilingResult— Output of ONNX Runtime profiling.TensorRTProfiler— TensorRT profiler component.TensorRTProfilingResult— Output of TensorRT profiling.TFLiteProfiler— TFLite profiler component.TFLiteProfilingResult— Output of TFLite profiling.
- class embedl_hub.core.profile.ONNXRuntimeProfiler(*, name: str | None = None, device: str | None = None, runs: int = 100, burn_ins: int = 10, cold_starts: int = 1, profiling_method: ProfilingMethod = ProfilingMethod.LAYERWISE)[source]#
Bases:
ComponentComponent that profiles ONNX models.
Supports two device types:
qai_hubdevices: Profile via Qualcomm AI Hub cloud service.embedl-onnxruntimedevices: Profile viaembedl-onnxruntime measure-latencyon a remote device over SSH.
Device-specific parameters (
embedl_onnxruntime_path,cli_args) are configured viaEmbedlONNXRuntimeConfigon the device.- run(ctx: HubContext, model: ONNXRuntimeCompiledModel, *, device: str | None = None, runs: int = 100, burn_ins: int = 10, cold_starts: int = 1, profiling_method: ProfilingMethod = ProfilingMethod.LAYERWISE) ONNXRuntimeProfilingResult[source]#
Profile an ONNX model via
embedl-onnxruntime measure-latency.Keyword arguments override the defaults set in the constructor. If a keyword argument is not provided here, the value from the constructor is used.
- Parameters:
ctx – The execution context with device configuration.
model – An
ONNXRuntimeCompiledModelwhosepathartifact points to an ONNX model.device – Name of the target device.
runs – Number of inference iterations. Only used on
embedl_onnxruntimedevices.burn_ins – Number of warm-up iterations before measurement. Only used on
embedl_onnxruntimedevices.cold_starts – Number of cold-start iterations. Only used on
embedl_onnxruntimedevices.profiling_method – Method specifying how to measure execution time. See
ProfilingMethod. Only used onembedl_onnxruntimedevices.
- Returns:
An
ONNXRuntimeProfilingResultwith latency and FPS metrics.
- class embedl_hub.core.profile.ONNXRuntimeProfilingResult(artifact_dir: Path | None, devices: dict[str, DeviceLog], run_log: RunLog | None, latency: LoggedMetric, fps: LoggedMetric, output_file: LoggedArtifact | None)[source]#
Bases:
ComponentOutputOutput from the ONNXRuntimeProfiler component.
Extends
ComponentOutputwith profiling-specific fields.- latency#
The average inference latency in milliseconds.
- fps#
The inferred frames per second.
- output_file#
The artifact containing the JSON profile, if available.
- fps: LoggedMetric#
- latency: LoggedMetric#
- output_file: LoggedArtifact | None#
- exception embedl_hub.core.profile.ProfileError[source]#
Bases:
RuntimeErrorRaised when a profiling job fails.
- class embedl_hub.core.profile.ProfilingMethod(value)[source]#
Bases:
EnumMethods for measuring execution time during model profiling.
Profiling can be done with multiple levels of granularity. Each method calculates the execution time using a different approach, so the results are not directly comparable.
Not every method is natively supported by every provider. When a provider does not support the requested method, it may fall back to an equivalent method with a warning.
- LAYERWISE = 'layerwise'#
Use the runtime’s built-in profiling infrastructure to obtain per-layer (or per-operator) execution times.
TensorRT: Uses
trtexec --exportProfileto produce a detailed per-layer JSON profile.ONNX Runtime: Uses the native ONNX Runtime profiler (
--profiling-method onnxruntime) to produce a detailed JSON profile with per-operator statistics.
- MODEL = 'model'#
Measure the total execution time of the model as reported by the runtime, without per-layer breakdown.
TensorRT: Uses
trtexec --exportTimesto report per-inference latency including enqueue and data-transfer time.ONNX Runtime: Falls back to
PYTHONwith a warning, since ONNX Runtime does not provide a native model-level timing mode distinct from wall-clock measurement.
- PYTHON = 'python'#
Use
time.time()to measure wall-clock time on the remote system. The elapsed time is divided by the number of iterations to compute the average latency.
- class embedl_hub.core.profile.TFLiteProfiler(*, name: str | None = None, device: str | None = None, benchmark_params: TFLiteBenchmarkParams | None = None)[source]#
Bases:
ComponentProfile a compiled TFLite model.
Dispatches to a device-specific implementation based on the configured device type.
- run(ctx: HubContext, model: TFLiteCompiledModel, *, device: str | None = None, benchmark_params: TFLiteBenchmarkParams | None = None) TFLiteProfilingResult[source]#
Profile a compiled TFLite model.
Keyword arguments override the defaults set in the constructor. If a keyword argument is not provided here, the value from the constructor is used.
- Parameters:
ctx – The execution context with device configuration.
model – The compiled TFLite model (from
TFLiteCompiler).device – Name of the target device.
benchmark_params – Optional TFLite benchmark parameters (only used by the AWS provider).
- Returns:
A
TFLiteProfilingResultwith latency and FPS metrics.
- class embedl_hub.core.profile.TFLiteProfilingResult(artifact_dir: Path | None, devices: dict[str, DeviceLog], run_log: RunLog | None, latency: LoggedMetric, fps: LoggedMetric, output_file: LoggedArtifact | None)[source]#
Bases:
ComponentOutputOutput of a TFLite profiling step.
- Parameters:
latency – The measured latency in milliseconds.
fps – Frames per second derived from the latency.
output_file – An optional logged artifact with detailed profile data.
- fps: LoggedMetric#
- latency: LoggedMetric#
- output_file: LoggedArtifact | None#
- class embedl_hub.core.profile.TensorRTProfiler(*, name: str | None = None, device: str | None = None, profiling_method: ProfilingMethod = ProfilingMethod.LAYERWISE)[source]#
Bases:
ComponentComponent that profiles TensorRT engines using
trtexec.Runs
trtexecon a remote device over SSH to benchmark a compiled.trt/.engine/.planengine and extracts latency and throughput metrics.Device-specific parameters (
trtexec_path,trtexec_cli_args) are configured viaTrtexecConfigon the device. Per-component overrides can be set viaprovider_config_overrides.- run(ctx: HubContext, model: TensorRTCompiledModel, *, device: str | None = None, profiling_method: ProfilingMethod = ProfilingMethod.LAYERWISE) TensorRTProfilingResult[source]#
Profile a compiled TensorRT engine via
trtexec.Keyword arguments override the defaults set in the constructor. If a keyword argument is not provided here, the value from the constructor is used.
- Parameters:
ctx – The execution context with device configuration.
model – A
TensorRTCompiledModelwhosepathartifact points to a compiled TensorRT engine.device – Name of the target device.
profiling_method – Method specifying how to measure execution time. See
ProfilingMethod.
- Returns:
A
TensorRTProfilingResultwith latency and FPS metrics.
- class embedl_hub.core.profile.TensorRTProfilingResult(artifact_dir: Path | None, devices: dict[str, DeviceLog], run_log: RunLog | None, latency: LoggedMetric, fps: LoggedMetric, output_file: LoggedArtifact | None)[source]#
Bases:
ComponentOutputOutput from the TensorRTProfiler component.
- Parameters:
latency – The average inference latency in milliseconds.
fps – The inferred frames per second.
output_file – The artifact containing the per-layer profile JSON, if available.
- fps: LoggedMetric#
- latency: LoggedMetric#
- output_file: LoggedArtifact | None#