Juncheng Yang - Assistant Professor in Harvard John A. Paulson School of Engineering and Applied Sciences

Juncheng Yang

Assistant Professor

Harvard John A. Paulson School of Engineering and Applied Sciences

juncheng@g.harvard.edu

SEC 4.410, 150 Western Ave, Allston, MA 02134

About

I am an Assistant Professor in Harvard John A. Paulson School of Engineering and Applied Sciences. I received my Ph.D. in Computer Science from Carnegie Mellon University. I am broadly interested in computer systems with particular focus on workload analysis, efficient, reliable and sustainable storage and machine learning systems. I perform in-depth measurement and analysis to get deep understanding of systems and algorithms in the real world. Leveraging insights from measurements, we design and build the next gen storage systems and distributed machine learning systems.

My works have received best-paper awards at NSDI'24, NSDI'21, SOSP'21, VALUETOOLS'24, and SYSTOR'16 and have been deployed in production at Google, VMware, Twitter, Redpanda, Momento with many open-source libraries contributed by the community. My research has been sponsored by Meta, Google Cloud, and AWS. I am a 2020 Meta Fellow, a 2023 Google Cloud Research Innovator, and a 2023 Rising Star in Machine Learning and Systems.

News [more]

04/2024   SIEVE received a community award at NSDI'24, was featured on TLDR newsletter and discussed in a blog post by Marc. Many open-source libraries (in more than 12 languages) are available on GitHub.
12/2023   S3-FIFO was discussed in Aleksey's Online Reading Group, covered by blogs in English, Chinese, Korean, and Japanese, used in UIUC CS525, and implemented at Google, VMware, Redpanda and many open-source libraries.

Research Areas and Interests

Storage systems and machine learning systems with a focus on efficiency, scalability and robustness:

  • Efficient and scalable cache management systems
  • Robust and reliable cache/storage management and machine learning systems [OSDI'20][NSDI'22][VLDB'23]
  • New approaches to make machine learning practical for storage systems (machine learning for systems) [FAST'23][SOCC'17]
  • Performance optimization and sustainability of microservices and serverless architecture [SOCC'23]
  • Reliable large model inference on wimpy hardware (system for machine learning)

Research Highlights

  • SIEVE (NSDI'24): the first cache eviction algorithm simpler than LRU but yet more effective than state-of-the-art algorithms for web caches. Adopted by software and systems such as Android API, BIND 9, ImmuDB, TiDB, PostgREST Implemented in many open-source libraries, e.g., Golang, Python, JavaScript, Rust, Java, Swift, Ruby, Nim, and Zig. Find more details here.
  • S3-FIFO (SOSP'23): a simple and scalable cache eviction algorithm composed of only FIFO queues. Implemented or deployed at companies including Google, VMware and Redpanda, and many open-source libraries. Find more details here.
  • Segcache (NSDI'21): received a community best-paper award, and deployed at Twitter and Momento.

Bio

Juncheng Yang is an Assistant Professor in Harvard John A. Paulson School of Engineering and Applied Sciences. He received his Ph.D. in Computer Science from Carnegie Mellon University in 2024. His research interests broadly cover the efficiency, performance, reliability, and sustainability of large-scale data systems.

Juncheng's works have received best paper awards at VALUETOOLS'24, NSDI'24, NSDI'21, SOSP'21, and SYSTOR'16. His OSDI'20 paper was recognized as one of the best storage papers at the conference and invited to ACM TOS'21. Juncheng received a Facebook Ph.D. Fellowship in 2020, was recognized as a Rising Star in machine learning and systems in 2023, and a Google Cloud Research Innovator in 2023.

His work, Segcache, has been adopted for production at Twitter and Momento. The two eviction algorithms he designed (S3-FIFO, SIEVE) have been adopted for production at Google, VMware, Redpanda, and several others, with over 20 open-source libraries available on GitHub. Moreover, the open-source cache simulation library he created, libCacheSim, has been used by almost 100 research institutes and companies.