Articles - Cajic Technology Partners LLC

Posted on January 20, 2026 by Dino Cajic

The Junior Developer Hiring Crisis Why landing your first tech job has never been harder. Maya is a 24-year-old CS grad with a 3.8 GPA and two internships. She applied to 387 software jobs in six months. Her callback rate was 2%. She couldn’t even get past automated screening. It felt like shouting into a void. This post is for anyone living that reality, and anyone hiring who wants to understand what changed. Jump to the data Introduction: Dreams vs. Reality for New Developers Not that long ago, a degree (or a bootcamp), a couple personal projects, and real effort

Top 10 Hidden Costs of AI Projects in Retail

Posted on January 19, 2026 by Dino Cajic

Top 10 Hidden Costs of AI Projects in Retail AI pilots can look cheap. Production is where budgets get tested. This page breaks down the 10 financial costs retail teams often miss, from data prep and cloud bills to integration, ongoing maintenance, and scaling. Talk Through Your Budget Why Retail AI Budgets Get Weird Picture this: you run a pilot on clean sample data, a few hundred SKUs, and a hand-assembled dashboard. It looks great. Then the business asks for 10,000 SKUs, promo pricing, store-level inventory, returns, and real-time decisions. That is when the hidden costs show up. Not because

How to Create an AI App

Posted on January 16, 2026 by Dino Cajic

How to Create an AI App An AI app is software that can understand inputs like text, voice, or images, learn from interactions, and respond with something useful. Sometimes that looks like a chatbot. Sometimes it looks like image recognition. Sometimes it is quietly helping a team make better decisions. If you have an idea for one, this page walks through what an AI app is, how the build typically goes, and what the best AI apps do differently. Talk Through Your Idea What Is an AI App? In plain English, an AI app is an application that does

Hardware & Infrastructure Security for AI

Posted on January 15, 2026 by Dino Cajic

Hardware & Infrastructure Security for AI AI workloads push crown-jewel assets (training data, prompts, embeddings, model weights, and logs) onto high-performance infrastructure where the attack surface isn’t just the app. This guide is written for IT security teams and focuses on securing the underlying stack across cloud and on-prem: confidential computing (TEEs), GPU risks, and hardened AI environments (network isolation, encryption of artifacts, access control, and monitoring). Jump to the sections How to think about AI infrastructure risk Traditional security models assume CPUs, kernels, and hypervisors are “trusted enough” and focus on app-layer controls. AI changes that assumption because training

AI Model Theft & IP Protection

Posted on January 14, 2026 by Dino Cajic

AI Model Theft & IP Protection AI models are expensive to train and they can be “stolen” without anyone downloading weights. If your model is exposed through a query API, a motivated attacker can try to clone its behavior by collecting inputs, harvesting outputs, and training a substitute model that acts the same. This page breaks down how model extraction works (in plain English, but technical), why it’s a real IP risk, and the defenses that actually help: watermarking, encryption/confidential computing, and smart API controls. Talk Security What Is a Model Extraction Attack? Model extraction (aka model stealing) is when

RAG 2.0: Structured, Self-Aware, Governed Retrieval

Posted on January 13, 2026 by Dino Cajic

RAG 2.0: Structured, Self-Aware, Governed Retrieval Retrieval-Augmented Generation (RAG) helps language models answer using your data instead of guessing. RAG 1.0 works, but it is a modular pipeline that can break in production. RAG 2.0 treats retrieval and generation as one system, adds smarter retrieval behavior, and makes compliance checks part of the flow. Get in Touch What is RAG? RAG is a setup where a language model pulls relevant documents at answer time, then uses that retrieved context to write a response. The goal is simple: fewer made-up answers and better alignment with what your sources actually say.

Reasoning Focused LLMs & Test Time Compute

Posted on January 12, 2026 by Dino Cajic

Reasoning-Focused LLMs & Test-Time Compute New “reasoning” language models don’t just answer… they work through the steps. The big shift is happening at inference time: models spend more compute to try, check, and refine. This deep dive breaks down what that means, why it helps (especially for math, code, and logic), and what you pay for the improvement. Start Reading What’s going on with “reasoning models”? A fun example: ask an AI how many “R” letters are in strawberry. Older models might guess. Reasoning-centric models will often spell it out and count. That step-by-step behavior is the point. It’s

The Current State of Multimodal Video Generation

Posted on January 9, 2026 by Dino Cajic

The Current State of Multimodal Video Generation Text to video has moved past “cool demo” territory. The real leap is that control, realism, and audio are landing together. This report breaks down what changed, how OpenAI’s Sora 2 compares to Google’s Veo 3.1, where teams are using these tools today, and what still needs work. Get in Touch Multimodal generation in plain English Multimodal generation is when a model can create across formats like text, images, audio, and video. Video is the hardest one to get right because it is not a single output. It is a sequence of frames

Small & Open-Weight Models Are Catching Up

Posted on January 8, 2026 by Dino Cajic

Small & Open-Weight Models Are Catching Up The performance gap between closed giants and open-weight models is shrinking fast. What’s changing the game is not just benchmark scores. It’s the combo of strong accuracy, much lower inference cost, and the ability to run and tune models on your own hardware. Jump to the TL;DR TL;DR Open-weight and smaller models (think Mistral 7B, Phi-2, Gemma, TinyLlama, and Mixtral) are now competitive on a lot of the benchmarks people actually care about: knowledge, reasoning, coding, and math. Closed models still lead at the very top end, but for many real products

Securing LLMs in Production

Posted on January 7, 2026 by Dino Cajic

Securing LLMs in Production LLMs make products feel magical, right up until someone realizes your chatbot can be manipulated with plain English. The new attack surface is the model’s behavior: what it will reveal, what it will believe, and what it can be tricked into doing. This page breaks down the real threats (prompt injection, data leakage, model theft, supply chain risks) and the platform options that help you defend them. Get in Touch Why LLM security is different Classic app security assumes your code follows rules. LLMs follow instructions, including instructions hidden inside documents, web pages, and chat