Working with Jobs

Querying Jobs

To programmatically get the list of job summaries, use get_job_summaries().

import qai_hub as hub

client = hub.Client()
job_summaries = client.get_job_summaries(limit=10)
print(job_summaries)

Given a specific job ID from the UI (an ID starting with j e.g. jvgdwk7z5), the job can be programmatically queried using get_job().

job = client.get_job("jvgdwk7z5")
print(job)

Profile Jobs

The results of a profile job can be obtained programmatically using the ProfileJob as follows:

profile = job.download_profile()
print(profile)

The output of the printed dictionary is as follows:

{
    'estimated_inference_time': 2997,
    'estimated_inference_peak_memory': 69177344,
    'first_load_time': 2162619,
    'first_load_peak_memory': 83742720,
    'warm_load_time': 123904,
    'warm_load_peak_memory': 73179136,
    'compile_time': 0,
    'compile_peak_memory': 0,
    'compile_memory_increase_range': None,
    'compile_memory_peak_range': None,
    'first_load_memory_increase_range': (0, 0),
    'first_load_memory_peak_range': (26226688, 31730736),
    'warm_load_memory_increase_range': (0, 10580480),
    'warm_load_memory_peak_range': (12865536, 37318656),
    'inference_memory_increase_range': (0, 12160),
    'inference_memory_peak_range': (12288, 21276192),
    'all_compile_times': [],
    'all_first_load_times': [2162619],
    'all_warm_load_times': [123904],
    'all_inference_times': [9130, .... ]
}

Memory is represented in bytes and times are represented as microseconds. To get the latency in milliseconds:

latency_ms = profile["execution_summary"]["execution_time"] / 1000

Model IO Specs

Compiled and linked models have input and output tensor specifications that describe the name, shape, and data type of each tensor. These are available via input_spec and output_spec:

model = job.get_target_model()

for tensor in model.input_spec[None]:
    print(f"{tensor.name}: shape={tensor.shape}, dtype={tensor.dtype}")

for tensor in model.output_spec[None]:
    print(f"{tensor.name}: shape={tensor.shape}, dtype={tensor.dtype}")

The input_spec and output_spec attributes are dicts mapping graph name to a list of TensorSpec. For single-graph models the key is None. Multi-graph models (e.g. linked context binaries) use the graph name as key:

# Linked model with multiple graphs
for graph_name, tensors in model.input_spec.items():
    for tensor in tensors:
        print(f"[{graph_name}] {tensor.name}: shape={tensor.shape}")

Quantized models also include scale and zero_point on each tensor spec:

for tensor in model.input_spec[None]:
    if tensor.scale is not None:
        print(f"{tensor.name}: scale={tensor.scale}, zero_point={tensor.zero_point}")

Downloading Job Artifacts

Jobs produce artifacts during execution. Depending on the job type, artifacts can include server logs, device logs, op-by-op timing summaries, or QNN HTP Analysis Summaries (QHAS). These are useful for troubleshooting job failures or understanding job behavior.

To see which artifacts are available for a completed job:

artifacts = job.get_available_artifacts()
print(artifacts)
# e.g. [<JobArtifactType.HUB_LOG: 5>, <JobArtifactType.DEVICE_LOG: 1>]

To download a specific artifact type:

paths = job.download_artifacts_for_type("./artifacts", hub.JobArtifactType.HUB_LOG)

To download all available logs (device log and hub log):

paths = job.download_job_logs("./artifacts")

Logs are also automatically included when calling download_results:

results = job.download_results("./results")
# results directory will contain the job's primary output (e.g. compiled model)
# as well as any available log files

Available artifact types are listed in JobArtifactType. Not all types are available for all jobs — use get_available_artifacts() to check what is available for a specific job.

Note

The format, content, and availability of job artifacts may change without notice. They are not part of the stable API contract.