• /
  • EnglishEspañol日本語한국어Português
  • Inicia sesiónComenzar ahora

Azure Machine Learning through Azure Monitor integration

New Relic's integrations include an integration for reporting your Microsoft Azure Machine Learning metrics and other data to New Relic. This document explains how to activate the integration and describes the data reported.

Features

New Relic gathers metrics data from Azure Monitor for the Azure Machine Learning service. Azure Machine Learning is a cloud service for accelerating and managing the machine learning project lifecycle. Machine learning professionals, data scientists, and engineers can use it in their day-to-day workflows to train and deploy models or manage MLOps

Using New Relic, you can:

Activate integration

Follow standard Azure Monitor integration procedure to activate your Azure service in New Relic infrastructure monitoring.

Configuration and polling

You can change the polling frequency and filter data using configuration options.

New Relic queries your Azure Machine Learning service through the Azure Monitor integration according to a default polling interval.

Find and use data

To explore your integration data, go to one.newrelic.com/infra > Azure > (select an integration).

Metric data

This integration collects the following metric data:

Azure Machine Learning metrics

Workspaces

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces resource type.

Metric

Description

ActiveCores

Number of active cores

ActiveNodes

Number of active nodes. These are the nodes which are actively running a job.

CancelRequestedRuns

Number of runs where cancel was requested for this workspace.

CancelledRuns

Number of runs cancelled for this workspace.

CompletedRuns

Number of runs completed successfully for this workspace.

CpuCapacityMillicores

Maximum capacity of a CPU node in millicores.

CpuMemoryCapacityMegabytes

Maximum memory utilization of a CPU node in megabytes.

CpuMemoryUtilizationMegabytes

Memory utilization of a CPU node in megabytes.

CpuMemoryUtilizationPercentage

Memory utilization percentage of a CPU node.

CpuUtilization

Percentage of utilization on a CPU node

CpuUtilizationMillicores

Utilization of a CPU node in millicores

CpuUtilizationPercentage

Utilization percentage of a CPU node.

DiskAvailMegabytes

Available disk space in megabytes.

DiskReadMegabytes

Data read from disk in megabytes

DiskUsedMegabytes

Used disk space in megabytes

DiskWriteMegabytes

Data written into disk in megabytes

Errors

Number of run errors in this workspace

FailedRuns

Number of runs failed for this workspace

FinalizingRuns

Data read from disk in megabytes

GpuCapacityMilliGPUs

Maximum capacity of a GPU device in milli-GPUs

GpuEnergyJoules

Interval energy in Joules on a GPU node

GpuMemoryCapacityMegabytes

Maximum memory capacity of a GPU device in megabytes.

GpuMemoryUtilization

Percentage of memory utilization on a GPU node.

GpuMemoryUtilizationMegabytes

Memory utilization of a GPU device in megabytes

GpuMemoryUtilizationPercentage

Memory utilization percentage of a GPU device

GpuUtilization

Percentage of utilization on a GPU node

GpuUtilizationMilliGPUs

Utilization of a GPU device in milli-GPUs

GpuUtilizationPercentage

Utilization percentage of a GPU device

IBReceiveMegabytes

Network data received over InfiniBand in megabytes

IBTransmitMegabytes

Network data sent over InfiniBand in megabytes

IdleCores

Number of idle cores

IdleNodes

Number of idle nodes

LeavingCores

Number of leaving cores

LeavingNodes

Number of leaving nodes

ModelDeployFailed

Number of model deployments that failed in this workspace

ModelDeployStarted

Number of model deployments started in this workspace

ModelDeploySucceeded

Number of model deployments that succeeded in this workspace

ModelRegisterFailed

Number of model registrations that failed in this workspace

ModelRegisterSucceeded

Number of model registrations that succeeded in this workspace

NetworkInputMegabytes

Network data received in megabytes. Metrics are aggregated in one minute intervals

NetworkOutputMegabytes

Network data sent in megabytes. Metrics are aggregated in one minute intervals.

Not Responding Runs

Number of runs not responding for this workspace.

NotStartedRuns

Number of runs in Not Started state for this workspace

PreemptedCores

Number of preempted cores

PreemptedNodes

Number of preempted nodes

PreparingRuns

Number of runs that are preparing for this workspace.

Provisioning Runs

Number of runs that are provisioning for this workspace.

Queued Runs

Number of runs that are queued for this workspace

QuotaUtilizationPercentage

Percent of quota utilized

Started Runs

Number of runs running for this workspace

Starting Runs

Number of runs started for this workspace

StorageAPIFailureCount

Azure Blob Storage API calls failure count.

StorageAPISuccessCount

Azure Blob Storage API calls success count.

TotalCores

Number of total cores

TotalNodes

Number of total nodes

UnusableCores

Number of unusable cores

UnusableNodes

Number of unusable nodes

Warnings

Number of run warnings in this workspace

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource type.

Metric

Description

CpuMemoryUtilizationPercentage

Percentage of memory utilization on an instance

CpuUtilizationPercentage

Percentage of CPU utilization on an instance

DataCollectionErrorsPerMinute

The number of data collection events dropped per minute

DataCollectionEventsPerMinute

The number of data collection events processed per minute.

DeploymentCapacity

The number of instances in the deployment

DiskUtilization

Percentage of disk utilization on an instance

GpuEnergyJoules

Interval energy in Joules on a GPU node

GpuMemoryUtilizationPercentage

Percentage of GPU memory utilization on an instance

GpuUtilizationPercentage

Percentage of GPU utilization on an instance.

RequestLatency_P50

The average P50 request latency

RequestLatency_P90

The average P90 request latency

RequestLatency_P95

The average P95 request latency

RequestLatency_P99

The average P99 request latency

RequestsPerMinute

The number of requests sent to online deployment within a minute

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints resource type.

Metric

Description

ConnectionsActive

The total number of concurrent TCP connections active from clients

DataCollectionErrorsPerMinute

The number of data collection events dropped per minute

DataCollectionEventsPerMinute

The number of data collection events processed per minute

NetworkBytes

The bytes per second served for the endpoint

NewConnectionsPerSecond

The average number of new TCP connections per second established from clients

RequestLatency

The average complete interval of time taken for a request to be responded in milliseconds

RequestLatency_P50

The average P50 request latency aggregated by all request latency values collected over the selected time period

RequestLatency_P90

The average P90 request latency aggregated by all request latency values collected over the selected time period

RequestLatency_P95

The average P95 request latency aggregated by all request latency values collected over the selected time period

RequestLatency_P99

The average P99 request latency aggregated by all request latency values collected over the selected time period

RequestsPerMinute

The number of requests sent to online endpoint within a minute

Copyright © 2024 New Relic Inc.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.