Earn 7.17% APY staking with Solana Compass + help grow Solana's ecosystem

Stake natively or with our LST compassSOL to earn a market leading APY

Scale or Die at Accelerate 2025: Messari x Solana Dev

By accelerate-25

Published on 2025-05-20

Messari's Diran Li shares insights on building data-driven applications on Solana, focusing on data curation, AI integration, and scalable solutions.

The notes below are AI generated and may not be 100% accurate. Watch the video to be sure!

Messari's Diran Li unveils groundbreaking tools and techniques for Solana developers, promising to revolutionize data-driven applications with AI-powered solutions and scalable data engineering practices.

Summary

At Accelerate 2025, Diran Li from Messari presented a compelling talk on the importance of scalability and data-driven development in the Solana ecosystem. Li outlined the challenges faced by developers in providing curated, high-signal data at scale, including data fragmentation, ingestion difficulties, and the need to process multiple data types simultaneously.

Messari's journey from simple ETL pipelines to sophisticated ELT processes was highlighted, emphasizing the importance of storing raw data and implementing robust data observability practices. Li stressed the significance of proper data engineering in AI pipelining, enabling advanced techniques like fine-tuning and model training.

The presentation introduced two powerful tools for Solana developers: the Signal dataset for evaluating data curation pipelines, and the AI toolkit, which brings crypto knowledge into a single assistant. These tools, leveraging Messari's vast data warehouse, aim to provide real-time, source-grounded insights to enhance Solana-based protocols and applications.

Key Points:

Data Curation Challenges

Diran Li began by addressing the common challenges faced by developers in the blockchain space, particularly when it comes to providing valuable insights to users. He highlighted three main issues: data fragmentation across the industry and ecosystem, difficulties in data ingestion due to the noisy nature of blockchain data, and the complexity of producing data at scale from multiple sources simultaneously. These challenges often result in developers juggling numerous tabs and sources to make sense of daily blockchain activities.

Messari's Data Engineering Evolution

Li provided a brief history of Messari's data engineering journey, starting from simple ETL (Extract, Transform, Load) pipelines in 2018 to more complex systems. As demand grew, they added more jobs, services, and databases, which eventually led to issues with data fragmentation and identifying the source of truth. The proliferation of Large Language Models (LLMs) and AI in 2022 further complicated their data curation efforts, prompting a shift in their approach.

ELT: A Game-Changing Approach

One of the most significant revelations shared by Li was Messari's transition from ETL to ELT (Extract, Load, Transform) processes. This shift involved always storing raw data before transformation, allowing for a more traceable and reproducible data pipeline. This approach enables easier error detection and correction, as transformations can be replayed from specific points in the process. Li emphasized this as a crucial learning for anyone working with large-scale data transformation.

Data Observability and AI Integration

Li stressed the importance of data observability in managing complex pipelines. He showcased an example of a backend job, illustrating how clear visualization of data flows, job schedules, and dependencies can significantly improve data management. Furthermore, Li highlighted the connection between effective AI pipelining and robust data engineering practices. This includes the ability to process data at scale, maintain good data lineage, and ensure proper sourcing and citation for AI-generated insights.

Developer Tools and Resources

The presentation concluded with an introduction to two powerful tools Messari is making available to Solana developers. The Signal dataset allows for evaluation of data curation pipelines, leveraging the entire data warehouse to provide AI insights on trending topics, key opinion leaders, and asset sentiment. The AI toolkit, described as bringing all crypto knowledge into one assistant, offers real-time, source-grounded answers via API, including citations, tables, and charts. Li emphasized that these tools are freely available for Solana developers to try and integrate into their projects.

Facts + Figures

  • Messari has built a Solana portal providing insights on token unlocks, news, research, fundraising, and key events.
  • The company's data warehouse contains 170 terabytes of curated data.
  • Messari's journey in data engineering began in 2018 with simple ETL pipelines.
  • The shift from ETL to ELT occurred around 2022, coinciding with the proliferation of LLMs and AI.
  • Messari's AI toolkit pulls from 170 terabytes of curated data to provide real-time, source-grounded answers.
  • The AI toolkit is being integrated into projects like Coinbase AI agent kit and Eliza OS.
  • Messari offers a free tier for every developer on Solana to try out their API.

Top quotes

  1. "Scale or die. It's incredible to be among so many talented engineers who are pushing the boundaries on what's possible on Solana."
  2. "Data is fragmented. It's fragmented across the industry. It's fragmented across the ecosystem."
  3. "Everything changed in about 2022 where LLM started proliferating and AI became very popular."
  4. "We always store the raw data. No matter how big that is, we always store the raw data and then transform afterwards."
  5. "Doing AI pipelining well means doing data engineering well."
  6. "The AI toolkit brings all of crypto knowledge into one assistant."

Questions Answered

What are the main challenges in providing curated, high-signal data at scale in the blockchain space?

The main challenges include data fragmentation across the industry and ecosystem, difficulties in data ingestion due to the noisy nature of blockchain data, and the complexity of producing data at scale from multiple sources simultaneously. These issues make it challenging for developers to provide valuable insights to users without sifting through numerous sources and data points.

How has Messari's approach to data engineering evolved over time?

Messari started with simple ETL (Extract, Transform, Load) pipelines in 2018, primarily focusing on ingesting market data. As demand grew, they added more jobs, services, and databases, which eventually led to a complex system with data fragmentation issues. In 2022, with the rise of LLMs and AI, Messari shifted to an ELT (Extract, Load, Transform) approach, emphasizing the storage of raw data before transformation to improve traceability and error correction.

What is the significance of switching from ETL to ELT in data processing?

The switch from ETL to ELT is significant because it allows for better data lineage and error correction. By always storing raw data before transformation, Messari can trace the history of data transformations more easily. This approach enables them to locate errors more efficiently and replay transformations from specific points, ensuring a more robust and reliable data pipeline.

How does Messari's AI toolkit benefit Solana developers?

Messari's AI toolkit brings all crypto knowledge into one assistant, leveraging 170 terabytes of curated data. It provides real-time, source-grounded answers via API, including citations, tables, and charts. This tool allows Solana developers to integrate crypto intelligence into their protocols and applications, enhancing their functionality and user experience. Messari offers a free tier for Solana developers to try out the API and integrate it into their projects.

What is the Signal dataset, and how does it help in evaluating data curation pipelines?

The Signal dataset is a tool provided by Messari that leverages their entire data warehouse to provide AI insights on trending topics, key opinion leaders, and asset sentiment. It helps evaluate the output of data curation pipelines by offering a comprehensive view of what the community is talking about most, which key opinion leaders and assets are gaining mindshare, and overall sentiment trends. This tool is crucial for developers looking to build data-driven applications on Solana.


Related Content

Scale or Die at Accelerate 2025: Welcome to Scale or Die: Day 2

Day 2 of Scale or Die event focuses on infrastructure and dev tooling with workshops and summits

Ship or Die at Accelerate 2025: Time Is Money (Kawz - Time.fun)

Kawz introduces Time.fun, a platform that tokenizes time and creates new capital markets on Solana

Scale or Die at Accelerate 2025: Solver Infrastructure

RockawayX Labs' Krystof Kosina discusses the challenges and solutions in developing cross-chain solvers on Solana

Ship or Die at Accelerate 2025: Lightning Talk: Vana

Anna Kazlauskas discusses data as an asset class and its role in AI, introducing Vana's innovative approach to data ownership and monetization.

Scale or Die at Accelerate 2025: The State of Solana MEV

An in-depth look at MEV on Solana, focusing on sandwich attacks and their impact on the ecosystem

Ship or Die at Accelerate 2025: Advancing Solana DeFi Innovation

OKX announces major developments for Solana, including XBTC integration and increased wallet usage

Ship or Die at Accelerate 2025: Compliant Onchain Products

Taylor Johnson of Exo Technologies discusses building compliant on-chain products on Solana, focusing on tokenization, compliance, and real-world asset integration.

Ship or Die at Accelerate 2025: Kraken

Kraken and Backed announce X stocks, bringing tokenized equities to Solana with permissionless, self-custody trading

Ship or Die at Accelerate 2025: Sergio Mello - Anchorage Digital

Anchorage Digital introduces innovative solutions for institutional DeFi participation on Solana

Scale or Die Accelerate 2025: Node Consensus Networks with Jito Restaking

Evan Batsell from Jito Labs explains how to build Node Consensus Networks (NCNs) using Jito restaking on Solana.

Scale or Die 2025: No-strings-attached programs w/ Pinocchio

Fernando Otero introduces Pinocchio, a new dependency-free SDK for writing efficient Solana programs

Scale or Die at Accelerate 2025: SVMKit: Solana Infrastructure as Code

Alexander Guy introduces SVMKit, a revolutionary tool for deploying and managing Solana infrastructure as code

Ship or Die at Accelerate 2025: Tokenizing Trust

Apollo's flagship credit fund tokenizes on Solana, offering enhanced yields through DeFi integration

Ship or Die 2025: Solana Attestation Service

Solana Attestation Service launches on mainnet, enabling seamless KYC and identity verification for blockchain applications

Scale or Die at Accelerate 2025: Writing Optimized Solana Programs

Dean Little from Blueshift delivers an in-depth exploration of Solana program optimization techniques at Accelerate 2025.