AI Evaluation and Observability Associate

PrismWorks AI -
Remote

Quick apply

Job details

Contract | Freelance
$17.75–$80.00 an hour
11 days ago

Qualifications

Microsoft Windows Server
Law
Distributed systems
Computer networking
GitHub

Full job description

Apply here: https://prismworks.ai/careers?job=HR-OPN-2026-0002&utm_source=indeed&utm_medium=job_board&utm_campaign=associate_hiring Location: Global - Remote (Anywhere in the world) Work model: 100% remote Engagement type: Independent associate / contractor AI Evaluation and Observability Associate PrismWorks AI is building a vetted independent associate network for enterprise AI implementation. We are looking for practitioners who can help enterprises measure, monitor, and improve AI system behavior before and after production deployment. PrismWorks focuses on production AI systems where reliability, evidence, traceability, and operational control matter. The Work You will design and implement evaluation harnesses, evidence capture, observability, incident replay, quality checks, and operational metrics for agentic, RAG, and AI workflow systems. The work turns AI behavior from anecdotal demo output into something teams can test, review, operate, and improve. Responsibilities Design evaluation strategies for agentic workflows, retrieval systems, and AI-assisted processes. Build test datasets, scoring approaches, regression checks, and quality gates. Implement observability for prompts, tools, retrieval, model responses, latency, cost, errors, and approval paths. Define evidence capture and incident replay patterns. Support production SLOs, dashboards, alerting, and governance reporting. Help delivery teams compare model, retrieval, and workflow changes safely. Collaborate with agent, data, cloud, governance, integration, and delivery associates. Create reusable eval, monitoring, and incident review templates. Strong Fit Background in AI engineering, ML evaluation, SRE, observability, QA automation, data quality, platform engineering, or reliability engineering. Experience turning ambiguous behavior into measurable checks and operational signals. Strong scripting, data analysis, or engineering implementation ability. Comfortable with both technical metrics and stakeholder-facing reporting. Useful Experience LLM evals, RAG evals, prompt/version testing, model comparison, observability pipelines, OpenTelemetry, dashboards, tracing, SLOs, or incident review. LangSmith, custom eval harnesses, vector search metrics, model gateways, log pipelines, warehouse analytics, or monitoring systems. Regulated or high-impact workflows where auditability and repeatability matter. Engagement Model Associates are engaged through independent contractor agreements for defined scopes. Project availability depends on client demand, role fit, and scheduling. Associates should be available for core collaboration hours aligned to Canada/USA time zones, and some engagements may require availability in the client's timezone. Express Interest Send a short note with your evaluation, observability, AI engineering, reliability, QA, or data quality background; your preferred engagement model, location/time zone, and working-hour constraints; and links to relevant examples, writing, dashboards, GitHub, or delivery artifacts.

Pay: $17.75-$80.00 per hour

Work Location: Remote

Quick apply

Job seeker tools

Employer Tools

Browse

Stay Connected