embedl_hub.core.profile package#

Profiler components and result types.

Re-exports#

class embedl_hub.core.profile.ONNXRuntimeProfiler(*, name: str | None = None, device: str | None = None, runs: int = 100, burn_ins: int = 10, cold_starts: int = 1, profiling_method: ProfilingMethod = ProfilingMethod.LAYERWISE)[source]#

Bases: Component

Component that profiles ONNX models.

Supports two device types:

  • qai_hub devices: Profile via Qualcomm AI Hub cloud service.

  • embedl-onnxruntime devices: Profile via embedl-onnxruntime measure-latency on a remote device over SSH.

Device-specific parameters (embedl_onnxruntime_path, cli_args) are configured via EmbedlONNXRuntimeConfig on the device.

run(ctx: HubContext, model: ONNXRuntimeCompiledModel, *, device: str | None = None, runs: int = 100, burn_ins: int = 10, cold_starts: int = 1, profiling_method: ProfilingMethod = ProfilingMethod.LAYERWISE) ONNXRuntimeProfilingResult[source]#

Profile an ONNX model via embedl-onnxruntime measure-latency.

Keyword arguments override the defaults set in the constructor. If a keyword argument is not provided here, the value from the constructor is used.

Parameters:
  • ctx – The execution context with device configuration.

  • model – An ONNXRuntimeCompiledModel whose path artifact points to an ONNX model.

  • device – Name of the target device.

  • runs – Number of inference iterations. Only used on embedl_onnxruntime devices.

  • burn_ins – Number of warm-up iterations before measurement. Only used on embedl_onnxruntime devices.

  • cold_starts – Number of cold-start iterations. Only used on embedl_onnxruntime devices.

  • profiling_method – Method specifying how to measure execution time. See ProfilingMethod. Only used on embedl_onnxruntime devices.

Returns:

An ONNXRuntimeProfilingResult with latency and FPS metrics.

run_type: ClassVar[RunType] = 'PROFILE'#
class embedl_hub.core.profile.ONNXRuntimeProfilingResult(artifact_dir: Path | None, devices: dict[str, DeviceLog], run_log: RunLog | None, latency: LoggedMetric, fps: LoggedMetric, output_file: LoggedArtifact | None)[source]#

Bases: ComponentOutput

Output from the ONNXRuntimeProfiler component.

Extends ComponentOutput with profiling-specific fields.

latency#

The average inference latency in milliseconds.

fps#

The inferred frames per second.

output_file#

The artifact containing the JSON profile, if available.

fps: LoggedMetric#
latency: LoggedMetric#
output_file: LoggedArtifact | None#
exception embedl_hub.core.profile.ProfileError[source]#

Bases: RuntimeError

Raised when a profiling job fails.

class embedl_hub.core.profile.ProfilingMethod(value)[source]#

Bases: Enum

Methods for measuring execution time during model profiling.

Profiling can be done with multiple levels of granularity. Each method calculates the execution time using a different approach, so the results are not directly comparable.

Not every method is natively supported by every provider. When a provider does not support the requested method, it may fall back to an equivalent method with a warning.

LAYERWISE = 'layerwise'#

Use the runtime’s built-in profiling infrastructure to obtain per-layer (or per-operator) execution times.

  • TensorRT: Uses trtexec --exportProfile to produce a detailed per-layer JSON profile.

  • ONNX Runtime: Uses the native ONNX Runtime profiler (--profiling-method onnxruntime) to produce a detailed JSON profile with per-operator statistics.

MODEL = 'model'#

Measure the total execution time of the model as reported by the runtime, without per-layer breakdown.

  • TensorRT: Uses trtexec --exportTimes to report per-inference latency including enqueue and data-transfer time.

  • ONNX Runtime: Falls back to PYTHON with a warning, since ONNX Runtime does not provide a native model-level timing mode distinct from wall-clock measurement.

PYTHON = 'python'#

Use time.time() to measure wall-clock time on the remote system. The elapsed time is divided by the number of iterations to compute the average latency.

class embedl_hub.core.profile.TFLiteProfiler(*, name: str | None = None, device: str | None = None, benchmark_params: TFLiteBenchmarkParams | None = None)[source]#

Bases: Component

Profile a compiled TFLite model.

Dispatches to a device-specific implementation based on the configured device type.

run(ctx: HubContext, model: TFLiteCompiledModel, *, device: str | None = None, benchmark_params: TFLiteBenchmarkParams | None = None) TFLiteProfilingResult[source]#

Profile a compiled TFLite model.

Keyword arguments override the defaults set in the constructor. If a keyword argument is not provided here, the value from the constructor is used.

Parameters:
  • ctx – The execution context with device configuration.

  • model – The compiled TFLite model (from TFLiteCompiler).

  • device – Name of the target device.

  • benchmark_params – Optional TFLite benchmark parameters (only used by the AWS provider).

Returns:

A TFLiteProfilingResult with latency and FPS metrics.

run_type: ClassVar[RunType] = 'PROFILE'#
class embedl_hub.core.profile.TFLiteProfilingResult(artifact_dir: Path | None, devices: dict[str, DeviceLog], run_log: RunLog | None, latency: LoggedMetric, fps: LoggedMetric, output_file: LoggedArtifact | None)[source]#

Bases: ComponentOutput

Output of a TFLite profiling step.

Parameters:
  • latency – The measured latency in milliseconds.

  • fps – Frames per second derived from the latency.

  • output_file – An optional logged artifact with detailed profile data.

fps: LoggedMetric#
latency: LoggedMetric#
output_file: LoggedArtifact | None#
class embedl_hub.core.profile.TensorRTProfiler(*, name: str | None = None, device: str | None = None, profiling_method: ProfilingMethod = ProfilingMethod.LAYERWISE)[source]#

Bases: Component

Component that profiles TensorRT engines using trtexec.

Runs trtexec on a remote device over SSH to benchmark a compiled .trt / .engine / .plan engine and extracts latency and throughput metrics.

Device-specific parameters (trtexec_path, trtexec_cli_args) are configured via TrtexecConfig on the device. Per-component overrides can be set via provider_config_overrides.

run(ctx: HubContext, model: TensorRTCompiledModel, *, device: str | None = None, profiling_method: ProfilingMethod = ProfilingMethod.LAYERWISE) TensorRTProfilingResult[source]#

Profile a compiled TensorRT engine via trtexec.

Keyword arguments override the defaults set in the constructor. If a keyword argument is not provided here, the value from the constructor is used.

Parameters:
  • ctx – The execution context with device configuration.

  • model – A TensorRTCompiledModel whose path artifact points to a compiled TensorRT engine.

  • device – Name of the target device.

  • profiling_method – Method specifying how to measure execution time. See ProfilingMethod.

Returns:

A TensorRTProfilingResult with latency and FPS metrics.

run_type: ClassVar[RunType] = 'PROFILE'#
class embedl_hub.core.profile.TensorRTProfilingResult(artifact_dir: Path | None, devices: dict[str, DeviceLog], run_log: RunLog | None, latency: LoggedMetric, fps: LoggedMetric, output_file: LoggedArtifact | None)[source]#

Bases: ComponentOutput

Output from the TensorRTProfiler component.

Parameters:
  • latency – The average inference latency in milliseconds.

  • fps – The inferred frames per second.

  • output_file – The artifact containing the per-layer profile JSON, if available.

fps: LoggedMetric#
latency: LoggedMetric#
output_file: LoggedArtifact | None#