When I started building CROW, my SaaS for automotive repair shops, I knew I wanted AI-powered features. The vision was simple: help shop owners and car owners make better maintenance decisions using AI.
The challenge? OpenAI API costs can spiral out of control fast. A naive implementation could easily cost hundreds of dollars per day at scale.
Here's how I built AI features that are both useful and affordable.
The Feature: AI Maintenance Recommendations
CROW's core AI feature analyzes a vehicle's service history, mileage, and age to provide personalized maintenance recommendations. Instead of generic "change oil every 5,000 miles," it considers:
A typical prompt might look like:
Vehicle: 2019 Honda Accord, 45,000 miles
Last oil change: 4,200 miles ago
Last brake service: 18 months ago
Climate: Canadian winter
Recent issues: NoneWhat maintenance should be prioritized?
The Cost Problem
Let's do the math. Using GPT-4o-mini:
Sounds cheap, right? But consider:
Still manageable. But what if users ask follow-up questions? What if they check daily? What if we scale to 100,000 users?
The costs multiply fast, and margins in SaaS are everything.
Strategy 1: Aggressive Caching
The same vehicle with the same history should get the same recommendations. I implemented Redis caching with semantic similarity:
import hashlib
import json
from redis import Redis
from openai import OpenAIredis = Redis.from_url(os.environ["REDIS_URL"])
client = OpenAI()
def get_maintenance_recommendations(vehicle_data: dict) -> str:
# Create a cache key from the relevant vehicle attributes
cache_key = create_cache_key(vehicle_data)
# Check cache first
cached = redis.get(f"ai:maintenance:{cache_key}")
if cached:
return cached.decode()
# Generate new recommendations
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": format_vehicle_prompt(vehicle_data)}
],
max_tokens=500
)
result = response.choices[0].message.content
# Cache for 7 days (recommendations don't change that fast)
redis.setex(f"ai:maintenance:{cache_key}", 604800, result)
return result
def create_cache_key(vehicle_data: dict) -> str:
# Only include fields that affect recommendations
relevant_fields = {
"make": vehicle_data["make"],
"model": vehicle_data["model"],
"year": vehicle_data["year"],
"mileage_bucket": vehicle_data["mileage"] // 5000 * 5000, # Round to 5k
"last_service_bucket": vehicle_data["days_since_service"] // 30, # Round to month
"climate": vehicle_data["climate"],
}
return hashlib.md5(json.dumps(relevant_fields, sort_keys=True).encode()).hexdigest()
mileage_bucket? Instead of caching per exact mileage (45,234 miles), we round to the nearest 5,000. A car at 45,234 miles gets the same cache as one at 47,891 miles. This dramatically increases cache hit rates with minimal accuracy loss.
This single optimization reduced our AI API calls by 60%.
Strategy 2: Prompt Engineering for Efficiency
Shorter prompts = fewer tokens = lower costs. But they also need to be effective.
Before (verbose):
You are an expert automotive maintenance advisor. You have deep knowledge
about all makes and models of vehicles. Your job is to analyze the vehicle
information provided and give detailed maintenance recommendations...
[500+ tokens of instructions]After (optimized):
Auto maintenance advisor. Respond with JSON: {"priority": [...], "upcoming": [...], "notes": "..."}
Rules: prioritize safety items, consider climate, be specific about intervals.The optimized version:
Strategy 3: Tiered AI Usage
Not every request needs GPT-4. I implemented a tiered system:
def get_ai_response(query_type: str, data: dict) -> str:
if query_type == "simple_lookup":
# Static database lookup, no AI needed
return lookup_maintenance_schedule(data)
elif query_type == "basic_recommendation":
# Use GPT-4o-mini for routine queries
return call_openai("gpt-4o-mini", data)
elif query_type == "complex_diagnosis":
# Use GPT-4 only for complex problem-solving
return call_openai("gpt-4", data)Most queries (80%+) hit the "simple_lookup" tier, which costs nothing. Only genuinely complex questions reach GPT-4.
Strategy 4: Rate Limiting with Grace
Users shouldn't feel restricted, but we need to prevent abuse:
from datetime import datetime, timedeltaclass AIRateLimiter:
def __init__(self, redis_client):
self.redis = redis_client
self.limits = {
"free": {"requests": 10, "window": timedelta(days=1)},
"pro": {"requests": 100, "window": timedelta(days=1)},
"enterprise": {"requests": 1000, "window": timedelta(days=1)},
}
def check_limit(self, user_id: str, tier: str) -> tuple[bool, int]:
key = f"ai_limit:{user_id}:{datetime.now().date()}"
current = int(self.redis.get(key) or 0)
limit = self.limits[tier]["requests"]
if current >= limit:
return False, 0
self.redis.incr(key)
self.redis.expire(key, 86400) # Reset daily
return True, limit - current - 1
When users hit limits, we show a friendly message and suggest upgrading—not an error.
Strategy 5: Pre-compute Where Possible
Some AI outputs can be generated in batch during off-peak hours:
# Nightly job: pre-generate recommendations for active vehicles
async def precompute_recommendations():
vehicles = await get_vehicles_with_upcoming_service()
for vehicle in vehicles:
# Generate and cache recommendations
await get_maintenance_recommendations(vehicle)
# Rate limit ourselves to avoid API throttling
await asyncio.sleep(0.5)When users open the app in the morning, their recommendations are already cached.
The Results
After implementing these strategies:
That's a 75% cost reduction while actually improving user experience (faster responses from cache).
Lessons Learned
1. Cache aggressively, but smartly
Don't cache exact inputs—cache semantic equivalents. Two cars with similar profiles should share recommendations.
2. Not everything needs AI
A surprising amount can be handled with good old-fashioned database lookups and business logic. Reserve AI for genuinely complex decisions.
3. Structure your outputs
Requesting JSON output makes responses more consistent and parseable. It also tends to be more concise.
4. Monitor costs daily
I have alerts set for when daily API costs exceed thresholds. Catching a bug that causes excessive API calls early saves real money.
What's Next
I'm exploring:
The AI landscape is evolving fast. What's expensive today might be cheap tomorrow. The key is building systems flexible enough to swap models and strategies as the economics change.
Building AI features into your product? Let's talk about making them cost-effective.