Scale or Die at Accelerate 2025: Messari x Solana Dev
By accelerate-25
Published on 2025-05-20
Messari's Diran Li shares insights on building data-driven applications on Solana, focusing on data curation, AI integration, and scalable solutions.
Messari's Diran Li unveils groundbreaking tools and techniques for Solana developers, promising to revolutionize data-driven applications with AI-powered solutions and scalable data engineering practices.
Summary
At Accelerate 2025, Diran Li from Messari presented a compelling talk on the importance of scalability and data-driven development in the Solana ecosystem. Li outlined the challenges faced by developers in providing curated, high-signal data at scale, including data fragmentation, ingestion difficulties, and the need to process multiple data types simultaneously.
Messari's journey from simple ETL pipelines to sophisticated ELT processes was highlighted, emphasizing the importance of storing raw data and implementing robust data observability practices. Li stressed the significance of proper data engineering in AI pipelining, enabling advanced techniques like fine-tuning and model training.
The presentation introduced two powerful tools for Solana developers: the Signal dataset for evaluating data curation pipelines, and the AI toolkit, which brings crypto knowledge into a single assistant. These tools, leveraging Messari's vast data warehouse, aim to provide real-time, source-grounded insights to enhance Solana-based protocols and applications.
Key Points:
Data Curation Challenges
Diran Li began by addressing the common challenges faced by developers in the blockchain space, particularly when it comes to providing valuable insights to users. He highlighted three main issues: data fragmentation across the industry and ecosystem, difficulties in data ingestion due to the noisy nature of blockchain data, and the complexity of producing data at scale from multiple sources simultaneously. These challenges often result in developers juggling numerous tabs and sources to make sense of daily blockchain activities.
Messari's Data Engineering Evolution
Li provided a brief history of Messari's data engineering journey, starting from simple ETL (Extract, Transform, Load) pipelines in 2018 to more complex systems. As demand grew, they added more jobs, services, and databases, which eventually led to issues with data fragmentation and identifying the source of truth. The proliferation of Large Language Models (LLMs) and AI in 2022 further complicated their data curation efforts, prompting a shift in their approach.
ELT: A Game-Changing Approach
One of the most significant revelations shared by Li was Messari's transition from ETL to ELT (Extract, Load, Transform) processes. This shift involved always storing raw data before transformation, allowing for a more traceable and reproducible data pipeline. This approach enables easier error detection and correction, as transformations can be replayed from specific points in the process. Li emphasized this as a crucial learning for anyone working with large-scale data transformation.
Data Observability and AI Integration
Li stressed the importance of data observability in managing complex pipelines. He showcased an example of a backend job, illustrating how clear visualization of data flows, job schedules, and dependencies can significantly improve data management. Furthermore, Li highlighted the connection between effective AI pipelining and robust data engineering practices. This includes the ability to process data at scale, maintain good data lineage, and ensure proper sourcing and citation for AI-generated insights.
Developer Tools and Resources
The presentation concluded with an introduction to two powerful tools Messari is making available to Solana developers. The Signal dataset allows for evaluation of data curation pipelines, leveraging the entire data warehouse to provide AI insights on trending topics, key opinion leaders, and asset sentiment. The AI toolkit, described as bringing all crypto knowledge into one assistant, offers real-time, source-grounded answers via API, including citations, tables, and charts. Li emphasized that these tools are freely available for Solana developers to try and integrate into their projects.
Facts + Figures
- Messari has built a Solana portal providing insights on token unlocks, news, research, fundraising, and key events.
- The company's data warehouse contains 170 terabytes of curated data.
- Messari's journey in data engineering began in 2018 with simple ETL pipelines.
- The shift from ETL to ELT occurred around 2022, coinciding with the proliferation of LLMs and AI.
- Messari's AI toolkit pulls from 170 terabytes of curated data to provide real-time, source-grounded answers.
- The AI toolkit is being integrated into projects like Coinbase AI agent kit and Eliza OS.
- Messari offers a free tier for every developer on Solana to try out their API.
Top quotes
- "Scale or die. It's incredible to be among so many talented engineers who are pushing the boundaries on what's possible on Solana."
- "Data is fragmented. It's fragmented across the industry. It's fragmented across the ecosystem."
- "Everything changed in about 2022 where LLM started proliferating and AI became very popular."
- "We always store the raw data. No matter how big that is, we always store the raw data and then transform afterwards."
- "Doing AI pipelining well means doing data engineering well."
- "The AI toolkit brings all of crypto knowledge into one assistant."
Questions Answered
What are the main challenges in providing curated, high-signal data at scale in the blockchain space?
The main challenges include data fragmentation across the industry and ecosystem, difficulties in data ingestion due to the noisy nature of blockchain data, and the complexity of producing data at scale from multiple sources simultaneously. These issues make it challenging for developers to provide valuable insights to users without sifting through numerous sources and data points.
How has Messari's approach to data engineering evolved over time?
Messari started with simple ETL (Extract, Transform, Load) pipelines in 2018, primarily focusing on ingesting market data. As demand grew, they added more jobs, services, and databases, which eventually led to a complex system with data fragmentation issues. In 2022, with the rise of LLMs and AI, Messari shifted to an ELT (Extract, Load, Transform) approach, emphasizing the storage of raw data before transformation to improve traceability and error correction.
What is the significance of switching from ETL to ELT in data processing?
The switch from ETL to ELT is significant because it allows for better data lineage and error correction. By always storing raw data before transformation, Messari can trace the history of data transformations more easily. This approach enables them to locate errors more efficiently and replay transformations from specific points, ensuring a more robust and reliable data pipeline.
How does Messari's AI toolkit benefit Solana developers?
Messari's AI toolkit brings all crypto knowledge into one assistant, leveraging 170 terabytes of curated data. It provides real-time, source-grounded answers via API, including citations, tables, and charts. This tool allows Solana developers to integrate crypto intelligence into their protocols and applications, enhancing their functionality and user experience. Messari offers a free tier for Solana developers to try out the API and integrate it into their projects.
What is the Signal dataset, and how does it help in evaluating data curation pipelines?
The Signal dataset is a tool provided by Messari that leverages their entire data warehouse to provide AI insights on trending topics, key opinion leaders, and asset sentiment. It helps evaluate the output of data curation pipelines by offering a comprehensive view of what the community is talking about most, which key opinion leaders and assets are gaining mindshare, and overall sentiment trends. This tool is crucial for developers looking to build data-driven applications on Solana.
On this page
- Summary
- Key Points:
- Facts + Figures
- Top quotes
-
Questions Answered
- What are the main challenges in providing curated, high-signal data at scale in the blockchain space?
- How has Messari's approach to data engineering evolved over time?
- What is the significance of switching from ETL to ELT in data processing?
- How does Messari's AI toolkit benefit Solana developers?
- What is the Signal dataset, and how does it help in evaluating data curation pipelines?
Related Content
Scale or Die at Accelerate 2025: Welcome to Scale or Die: Day 2
Day 2 of Scale or Die event focuses on infrastructure and dev tooling with workshops and summits
Ship or Die at Accelerate 2025: Time Is Money (Kawz - Time.fun)
Kawz introduces Time.fun, a platform that tokenizes time and creates new capital markets on Solana
Scale or Die at Accelerate 2025: Solver Infrastructure
RockawayX Labs' Krystof Kosina discusses the challenges and solutions in developing cross-chain solvers on Solana
Ship or Die at Accelerate 2025: Lightning Talk: Vana
Anna Kazlauskas discusses data as an asset class and its role in AI, introducing Vana's innovative approach to data ownership and monetization.
Scale or Die at Accelerate 2025: The State of Solana MEV
An in-depth look at MEV on Solana, focusing on sandwich attacks and their impact on the ecosystem
Ship or Die at Accelerate 2025: Advancing Solana DeFi Innovation
OKX announces major developments for Solana, including XBTC integration and increased wallet usage
Ship or Die at Accelerate 2025: Compliant Onchain Products
Taylor Johnson of Exo Technologies discusses building compliant on-chain products on Solana, focusing on tokenization, compliance, and real-world asset integration.
Ship or Die at Accelerate 2025: Kraken
Kraken and Backed announce X stocks, bringing tokenized equities to Solana with permissionless, self-custody trading
Ship or Die at Accelerate 2025: Sergio Mello - Anchorage Digital
Anchorage Digital introduces innovative solutions for institutional DeFi participation on Solana
Scale or Die Accelerate 2025: Node Consensus Networks with Jito Restaking
Evan Batsell from Jito Labs explains how to build Node Consensus Networks (NCNs) using Jito restaking on Solana.
Scale or Die 2025: No-strings-attached programs w/ Pinocchio
Fernando Otero introduces Pinocchio, a new dependency-free SDK for writing efficient Solana programs
Scale or Die at Accelerate 2025: SVMKit: Solana Infrastructure as Code
Alexander Guy introduces SVMKit, a revolutionary tool for deploying and managing Solana infrastructure as code
Ship or Die at Accelerate 2025: Tokenizing Trust
Apollo's flagship credit fund tokenizes on Solana, offering enhanced yields through DeFi integration
Ship or Die 2025: Solana Attestation Service
Solana Attestation Service launches on mainnet, enabling seamless KYC and identity verification for blockchain applications
Scale or Die at Accelerate 2025: Writing Optimized Solana Programs
Dean Little from Blueshift delivers an in-depth exploration of Solana program optimization techniques at Accelerate 2025.
- Borrow / Lend
- Liquidity Pools
- Token Swaps & Trading
- Yield Farming
- Solana Explained
- Is Solana an Ethereum killer?
- Transaction Fees
- Why Is Solana Going Up?
- Solana's History
- What makes Solana Unique?
- What Is Solana?
- How To Buy Solana
- Solana's Best Projects: Dapps, Defi & NFTs
- Choosing The Best Solana Validator
- Staking Rewards Calculator
- Liquid Staking
- Can You Mine Solana?
- Solana Staking Pools
- Stake with us
- How To Unstake Solana
- How validators earn
- Best Wallets For Solana