Ryosuke Matsuda

Ryosuke Matsuda

M1 · Tohoku University

Hello and welcome! I am in my first year of master's students and am a member of the Tohoku NLP Group and FaiLab. My research interests are mainly in the area of large visual language models (LVLMs).
Currently, I am working on two research topics: Shitsukan and T2V models.

Interests
Vision & Language Natural Language Processing Large Visual Language Models Shitsukan (Material Perception) Text-to-Video models
Biography
2003.07
Born
Iwate, Japan
2022.04
Tohoku University — Bachelor
Department of Electrical, Information and Physics Engineering, School of Engineering
2025.04
Tohoku University — Master
Fundamental Artificial Intelligence, Department of System Information Sciences, Graduate School of Information Sciences
Publications
CVPR 2026 figure
[1] SLVMEval: Synthetic Meta Evaluation Benchmark for Text-to-Long Video Generation
Ryosuke Matsuda, Keito Kudo, Haruto Yoshida, Nobuyuki Shimizu, Jun Suzuki
The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026), Denver, CO, June 2026.
Abstract
We introduce SLVMEval, a benchmark for meta-evaluating text-to-video (T2V) evaluation systems. SLVMEval focuses on assessing these systems on long videos of up to 10,486 seconds (approximately 3 hours). Our benchmark targets a fundamental requirement: whether systems can accurately judge video quality in settings that are easy for humans to assess. We adopt a pairwise comparison-based meta-evaluation framework. Building on dense video captioning datasets, we synthetically degrade source videos to create controlled "high-quality vs. low-quality" pairs across 10 distinct aspects. We then use crowdsourcing to filter and retain only those pairs in which the degradation is clearly perceptible, thereby establishing the final testbed. Using this testbed, we assess the reliability of existing evaluation systems in ranking these pairs. Our experiments show that human evaluators identify the better long video with 84.7%–96.8% accuracy, while in 9 of the 10 aspects, the accuracy of these systems falls short of human judgment, revealing weaknesses in text-to-long video evaluation.
NLP 2026 figure
[2] ⻑尺動画⽣成タスクにおけるメタ評価ベンチマーク
松⽥ 陵佑, ⼯藤 慧⾳, 吉⽥ 遥⾳, 清⽔ 伸幸, 鈴⽊ 潤
言語処理学会第32回年次大会(NLP 2026), 宇都宮, 栃木, pp.596-601, March 2026.
Abstract
動画生成モデルの評価システム自体の性能をメタ評価するためのベンチマーク SLVMEval を提案する.SLVMEval は合成的に構築された,高品質動画と低品質動画のペアからなる.これらの動画は平均約 19 分,最大約 3 時間の長尺動画となっている.評価システムは,与えられた動画ペアのどちらが高品質であるかを識別できる割合(正解率)によって評価される.実験の結果,既存の自動評価システムは 10 観点中 9 観点で人間の正解率に及ばず,特にプロンプトと動画の一貫性に関する観点で性能が著しく低いことを明らかにした.
NLP 2025 figure
[3] 大規模視覚言語モデルの質感知覚能力の分析
松田陵佑, 塩野大輝, Ana Brasard, 鈴木潤
言語処理学会第31回年次大会(NLP 2025), 長崎, pp.2550–2555, March 2025.
Abstract
本研究では,「質感」に焦点を当て,大規模視覚言語モデル(LVLM)の質感知覚能力を調査し,さらに LVLM と人間との間の質感知覚の整合性を分析することを目的とする.はじめに画像内の物体に対して人間が知覚する質感語を人手で収集した.次に,収集した質感語をもとに,LVLM が適切な質感語を選択できるか評価する分類タスクを設計し,LVLM と人間の正解率を算出した.また,LVLM に質感語を生成させ,その出力を人間が評価する生成タスクも実施した.最終的には,分類タスクの正解率が高い LVLM は,生成タスクにおいても高いスコアを示すことを確認し,分類タスクが,LVLM の質感知覚能力の評価だけでなく,人間知覚の整合性まで簡易に評価できる可能性があることを示す.
Activity
Teaching Assistant (TA)
  • Technical Support · Tohoku University · 2025.04 - 2027.03
    Provide learning support and ICT consultations at Kawanai North Campus.
  • TIS Lecture · Tohoku University · 2025.10 - 2026.03
    TIS Lecture: Practical Lecture of System Integration for AI Era 2025
  • Generative AI App Development · Tohoku University · 2025.04 - 2025.08
    Teaching Assistant for the course on developing practical applications using generative AI.
Research Assistant (RA)
  • LINE Yahoo Corporation · Part-time · 2025.03 - Present
    Research collaboration with LINE Yahoo as RA.
Awards
Early Graduation
Tohoku University Early Graduation
Graduated Tohoku University (Bachelor) in 3 years! (5/2402 ≒ 0.21 %)
Skills
Python C++ SwiftUI AWS Docker/Singularity Coding Agent