Conversational AI
meets the street.
citypal combines state-of-the-art language models, real-time spatial intelligence, and audio streaming technology to create a tour guide that actually understands you.
Why this is hard.
Real-Time Performance
Conversational AI needs to respond in under 2 seconds. At walking speed, context changes constantly. Traditional chatbots can't keep up.
Spatial Awareness
"What's that building?" only makes sense if the AI knows exactly where you are, what's visible, and what you're likely pointing at.
Contextual Memory
The AI must remember your interests, route, previous questions, and time constraints—all while maintaining natural conversation flow.
How citypal works.
Five interconnected layers working in real-time to deliver contextual, conversational walking tours.
Client Layer
Audio-first React Native app optimized for background operation and low battery drain.
Intelligence Layer
Proprietary context engine that manages conversation, personalization, and fact-checking.
Knowledge Layer
Vector database with city knowledge graphs for semantic search in milliseconds.
Spatial Layer
GPS and sensor fusion to understand location, direction, and points of interest.
Audio Layer
Streaming voice synthesis with natural prosody and ultra-low latency.
The Intelligence Layer.
Foundation models like GPT-4 are powerful, but they don't understand where you are, what you're seeing, or what you care about. Our intelligence layer bridges that gap.
Context Management
We maintain a rich, dynamic context window that includes:
- •User state: Location, heading, speed, time of day
- •Visible POIs: What's in view, based on GPS + compass
- •Conversation history: Recent topics, unanswered questions
- •Learned preferences: Architecture? Food? History?
Semantic Retrieval
When you ask a question, we don't just search keywords—we understand intent:
- •Vector search: Find semantically similar stories/facts
- •Spatial filtering: Prioritize nearby, visible locations
- •Personalization: Surface content matching your interests
- •RAG architecture: Inject relevant facts into LLM context
Hallucination Prevention
LLMs can "hallucinate" false facts. Our system prevents this:
- •Fact-checking layer: Verify claims against knowledge graph
- •Source attribution: Track where each fact comes from
- •Confidence scoring: AI admits when it doesn't know
- •Human review: Local experts curate and verify content
Personality & Voice
citypal isn't just accurate—it's engaging and human:
- •Tone adaptation: Playful, scholarly, or practical based on you
- •Story selection: Choose anecdotes over dry facts
- •Natural pacing: Pauses, emphasis, conversational flow
- •Local voices: City-specific personalities and accents
Low-latency audio streaming.
Traditional text-to-speech waits for the entire response before speaking. That's too slow for natural conversation.
We use streaming synthesis: citypal starts speaking within 500ms of your question, even while the AI is still generating the rest of the response.
The Result:
- →<2s from question to first word spoken
- →Interruption support: Stop AI mid-sentence to ask follow-up
- →Adaptive bitrate: Works on 3G, optimized for 5G
- →Background mode: Phone locked, audio continues
Audio Pipeline
Privacy-first. Performance-obsessed.
Privacy by Design
- ✓Location Never Leaves DeviceGPS coordinates processed locally. Server only sees city-level data.
- ✓Ephemeral ConversationsAudio not stored. Transcripts auto-delete after session.
- ✓No Third-Party TrackingNo analytics SDKs, no ad networks, no data brokers.
- ✓Anonymous by DefaultNo login required for basic tours. Data tied to device, not identity.
Optimized Performance
- →Edge ComputingContent cached at CDN edge nodes. <50ms latency worldwide.
- →Offline ModeDownload cities ahead of time. Core features work without network.
- →Battery Optimization4+ hours active use. Background GPS managed intelligently.
- →Adaptive QualityDegrades gracefully on slow connections. Never stops working.
Why we're ahead.
Foundation models are commoditized. Our competitive advantage is the intelligence layer and years of R&D on spatial AI.
Spatial-First Architecture
Built from the ground up for location-aware AI. Not a chatbot with GPS bolted on—every system designed for spatial context.
Curated Knowledge
Local experts, historians, and storytellers contribute content. Not scraped Wikipedia—verified, compelling narratives.
R&D in Conversational UX
Years invested in how people actually talk while walking. Interruption handling, pause detection, natural turn-taking.
Interested in the tech?
We're hiring engineers, researchers, and technical leaders. Or if you're an investor, let's talk about our technical moat.