Publications

Most cited and implemented works with links to their production landings.

Landed
arXiv2025-12-02
Alibaba Group

Enhancing Talent Search Ranking with Role-Aware Expert Mixtures and LLM-based Fine-Grained Job Descriptions

Abstract: Talent search is a cornerstone of modern recruitment systems, yet existing approaches often struggle to capture nuanced job-specific preferences, model recruiter behavior at a fine-grained level, and mitigate noise from subjective human judgments. We present a novel framework that enhances talent search effectiveness and delivers substantial business value through two key innovations: (i) leveraging LLMs to extract fine-grained recruitment signals from job descriptions and historical hiring data, and (ii) employing a role-aware multi-gate MoE network to capture behavioral differences across recruiter roles. To further reduce noise, we introduce a multi-task learning module that jointly optimizes click-through rate (CTR), conversion rate (CVR), and resume matching relevance. Experiments on real-world recruitment data and online A/B testing show relative AUC gains of 1.70% (CTR) and 5.97% (CVR), and a 17.29% lift in click-through conversion rate. These improvements reduce dependence on external sourcing channels, enabling an estimated annual cost saving of millions of CNY.

cs.IRcs.AIcs.LG
Scenario InsightTalent Search Ranking
Enhances ranking in recruitment systems for matching candidates to job descriptions based on recruiter behavior.
CTR+1.70%CVR+5.97%
Landed
arXiv2025-12-02
Xiaohongshu Inc.

Optimizing Generative Ranking Relevance via Reinforcement Learning in Xiaohongshu Search

Abstract: Ranking relevance is a fundamental task in search engines, aiming to identify the items most relevant to a given user query. Traditional relevance models typically produce scalar scores or directly predict relevance labels, limiting both interpretability and the modeling of complex relevance signals. Inspired by recent advances in Chain-of-Thought (CoT) reasoning for complex tasks, we investigate whether explicit reasoning can enhance both interpretability and performance in relevance modeling. However, existing reasoning-based Generative Relevance Models (GRMs) primarily rely on supervised fine-tuning on large amounts of human-annotated or synthetic CoT data, which often leads to limited generalization. Moreover, domain-agnostic, free-form reasoning tends to be overly generic and insufficiently grounded, limiting its potential to handle the diverse and ambiguous cases prevalent in open-domain search. In this work, we formulate relevance modeling in Xiaohongshu search as a reasoning task and introduce a Reinforcement Learning (RL)-based training framework to enhance the grounded reasoning capabilities of GRMs. Specifically, we incorporate practical business-specific relevance criteria into the multi-step reasoning prompt design and propose Stepwise Advantage Masking (SAM), a lightweight process-supervision strategy which facilitates effective learning of these criteria through improved credit assignment. To enable industrial deployment, we further distill the large-scale RL-tuned model to a lightweight version suitable for real-world search systems. Extensive experiments on industrial datasets, along with online A/B tests, demonstrate the effectiveness of our approach.

cs.IRcs.AI
Scenario InsightXiaohongshu Search
Search engine ranking relevance optimization using generative models and RL.
Landed
arXiv2025-12-02

First On-Orbit Demonstration of a Geospatial Foundation Model

Abstract: Geospatial foundation models (GeoFMs) promise broad generalisation capacity for Earth observation (EO) tasks, particularly under data-limited conditions. However, their large size poses a barrier to deployment on resource-constrained space hardware. To address this, we present compact variants of a Vision Transformer (ViT)-based GeoFM that preserve downstream task performance while enabling onboard execution. Evaluation across five downstream tasks and validation in two representative flight environments show that model compression and domain adaptation are critical to reducing size and resource demands while maintaining high performance under operational conditions. We further demonstrate reliable on-orbit inference with the IMAGIN-e payload aboard the International Space Station. These results establish a pathway from large GeoFMs to flight-ready, resource-efficient deployments, expanding the feasibility of onboard AI for EO missions.

cs.LGcs.AIcs.CV
Scenario InsightIMAGIN-e payload for Geospatial Foundation Model (GeoFM)
Compact ViT-based GeoFM for Earth observation (EO) tasks on resource-constrained space hardware.
Landed
arXiv2025-12-02
TIFIN India

Multilingual Conversational AI for Financial Assistance: Bridging Language Barriers in Indian FinTech

Abstract: India's linguistic diversity presents both opportunities and challenges for fintech platforms. While the country has 31 major languages and over 100 minor ones, only 10\% of the population understands English, creating barriers to financial inclusion. We present a multilingual conversational AI system for a financial assistance use case that supports code-mixed languages like Hinglish, enabling natural interactions for India's diverse user base. Our system employs a multi-agent architecture with language classification, function management, and multilingual response generation. Through comparative analysis of multiple language models and real-world deployment, we demonstrate significant improvements in user engagement while maintaining low latency overhead (4-8\%). This work contributes to bridging the language gap in digital financial services for emerging markets.

cs.CL
Scenario InsightFinancial Assistance in Indian FinTech
多语言对话 AI 系统,用于印度 FinTech 平台的金融协助,支持 code-mixed 语言如 Hinglish。
latency overhead4-8%
Landed
arXiv2025-12-02

Mofasa: A Step Change in Metal-Organic Framework Generation

Abstract: Mofasa is an all-atom latent diffusion model with state-of-the-art performance for generating Metal-Organic Frameworks (MOFs). These are highly porous crystalline materials used to harvest water from desert air, capture carbon dioxide, store toxic gases and catalyse chemical reactions. In recognition of their value, the development of MOFs recently received a Nobel Prize in Chemistry. In many ways, MOFs are well-suited for exploiting generative models in chemistry: they are rationally-designable materials with a large combinatorial design space and strong structure-property couplings. And yet, to date, a high performance generative model has been lacking. To fill this gap, we introduce Mofasa, a general-purpose latent diffusion model that jointly samples positions, atom-types and lattice vectors for systems as large as 500 atoms. Mofasa avoids handcrafted assembly algorithms common in the literature, unlocking the simultaneous discovery of metal nodes, linkers and topologies. To help the scientific community build on our work, we release MofasaDB, an annotated library of hundreds of thousands of sampled MOF structures, along with a user-friendly web interface for search and discovery: https://mofux.ai/ .

cs.LGcond-mat.mtrl-sci
Scenario InsightMofasa
All-atom latent diffusion model for generating Metal-Organic Frameworks (MOFs), with applications in water harvesting, CO2 capture, gas storage, and catalysis.
Landed
arXiv2025-12-02
Kuaishou Technology

Heterogeneous Multi-treatment Uplift Modeling for Trade-off Optimization in Short-Video Recommendation

arXiv:2511.18997v2 Announce Type: replace Abstract: The rapid proliferation of short videos on social media platforms presents unique challenges and opportunities for recommendation systems. Users exhibit diverse preferences, and the responses resulting from different strategies often conflict with one another, potentially exhibiting inverse correlations between metrics such as watch time and video view counts. Existing uplift models face limitations in handling the heterogeneous multi-treatment scenarios of short-video recommendations, often failing to effectively capture both the synergistic and individual causal effects of different strategies. Furthermore, traditional fixed-weight approaches for balancing these responses lack personalization and can result in biased decision-making. To address these issues, we propose a novel Heterogeneous Multi-treatment Uplift Modeling (HMUM) framework for trade-off optimization in short-video recommendations. HMUM comprises an Offline Hybrid Uplift Modeling (HUM) module, which captures the synergistic and individual effects of multiple strategies, and an Online Dynamic Decision-Making (DDM) module, which estimates the value weights of different user responses in real-time for personalized decision-making. Evaluated on two public datasets, an industrial dataset, and through online A/B experiments on the Kuaishou platform, our model demonstrated superior offline performance and significant improvements in key metrics. It is now fully deployed on the platform, benefiting hundreds of millions of users.

cs.IR
Scenario InsightKuaishou short-video recommendation
Recommendation system for short videos on the Kuaishou platform, addressing trade-offs between metrics like watch time and video view counts.
Landed
arXiv2025-12-02
University College Dublin

How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison

arXiv:2510.26899v3 Announce Type: replace-cross Abstract: The launch of Grokipedia - an AI-generated encyclopedia developed by Elon Musk's xAI - was presented as a response to perceived ideological and structural biases in Wikipedia, aiming to produce "truthful" entries using the Grok large language model. Yet whether an AI-driven alternative can escape the biases and limitations of human-edited platforms remains unclear. This study conducts a large-scale computational comparison of more than 17,000 matched article pairs from the 20,000 most-edited English Wikipedia pages. Using metrics spanning lexical richness, readability, reference density, structural features, and semantic similarity, we assess how closely the two platforms align in form and substance. We find that Grokipedia articles are substantially longer and contain significantly fewer references per word. Moreover, Grokipedia's content divides into two distinct groups: one that remains semantically and stylistically aligned with Wikipedia, and another that diverges sharply. Among the dissimilar articles, we observe a systematic rightward shift in the political bias of cited news sources, concentrated primarily in entries related to politics, history, and religion. These findings suggest that AI-generated encyclopedic content diverges from established editorial norms-favouring narrative expansion over citation-based verification. The implications highlight emerging tensions around transparency, provenance, and the governance of knowledge in an era of automated text generation.

cs.CYcs.AIcs.SI
Scenario InsightGrokipedia
AI-generated encyclopedia developed by xAI using the Grok large language model.
Landed
arXiv2025-11-21
Alibaba (Taobao & Tmall Group)

AIF: Asynchronous Inference Framework for Cost-Effective Pre-Ranking

In industrial recommendation systems, pre-ranking models based on deep neural networks (DNNs) commonly adopt a sequential execution framework: feature fetching and model forward computation are triggered only after receiving candidates from the upstream retrieval stage. This design introduces inherent bottlenecks, including redundant computations of identical users/items and increased latency due to strictly sequential operations, which jointly constrain the model's capacity and system efficiency. To address these limitations, we propose the Asynchronous Inference Framework (AIF), a cost-effective computational architecture that decouples interaction-independent components, those operating within a single user or item, from real-time prediction. AIF reorganizes the model inference process by performing user-side computations in parallel with the retrieval stage and conducting item-side computations in a nearline manner. This means that interaction-independent components are calculated just once and completed before the real-time prediction phase of the pre-ranking stage. As a result, AIF enhances computational efficiency and reduces latency, freeing up resources to significantly improve the feature set and model architecture of interaction-independent components. Moreover, we delve into model design within the AIF framework, employing approximated methods for interaction-dependent components in online real-time predictions. By co-designing both the framework and the model, our solution achieves notable performance gains without significantly increasing computational and latency costs. This has enabled the successful deployment of AIF in the Taobao display advertising system.

cs.LGcs.IR
Scenario InsightTaobao display advertising system
淘宝展示广告系统的预排序阶段
Landed
arXiv2025-11-21

Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

We introduce Orion, a visual agent that integrates vision-based reasoning with tool-augmented execution to achieve powerful, precise, multi-step visual intelligence across images, video, and documents. Unlike traditional vision-language models that generate descriptive outputs, Orion orchestrates a suite of specialized computer vision tools, including object detection, keypoint localization, panoptic segmentation, Optical Character Recognition (OCR), and geometric analysis, to execute complex multi-step visual workflows. The system achieves competitive performance across MMMU, MMBench, DocVQA, and MMLongBench while extending monolithic VLM capabilities to production-grade visual intelligence. Through its agentic, tool-augmented approach, Orion enables autonomous visual reasoning that bridges neural perception with symbolic execution, marking the transition from passive visual understanding to active, tool-driven visual intelligence. Try Orion for free at: https://chat.vlm.run Learn more at: https://www.vlm.run/orion

cs.CVcs.AIcs.LG
Scenario InsightOrion
A unified visual agent integrating vision-based reasoning with tool-augmented execution for multimodal perception, advanced visual reasoning, and execution across images, video, and documents.
Landed
arXiv2025-11-21

Comparative Security Performance of Workday Cloud ERP Across Key Dimensions

Workday is a cloud-based Enterprise Resource Planning-ERP system that brings HR, Finance, Supply Chain functions , Prism Analytics and Extend custom built in application together under an integrated software as a service SaaS environment. As every organization that undergoes digital transformation, the importance of securing sensitive enterprise data in cloud ERP systems has always been more challeging. To analyze Workday's security architecture, we present a Security analysis in both CIA Triad Enhanced Framework and Zero Trust Security Architecture. The study examines five key dimensions confidentiality, integrity, availability, authentication, and compliance with weighted sub metric analysis and qualitative document review. The results show Workday delivers a composite score of 0.86 with an overall score that closely matches international standards of best practices like GDPR, HIPAA, SOC 2, etc. The platform uses encryption protocols, granular access controls, network safeguards, and continuous verification mechanisms to enable least-privilege access and adaptive defense. Security groups and business process access rules provide scalable governance across very large organizational structures.Workday's layered security to tackle everyday cloud security weaknesses. The work concludes that Workday's architecture demonstrates the best practices for secure, scalable, and compliant ERP application-oriented deployment, which can make this a standard for enterprise cloud security management. These insights provide important guidance for organizations that wish to bolster their cloud ERP defenses and stay ahead of changing regulatory expectations.

cs.CY
Scenario InsightWorkday Cloud ERP
Cloud-based SaaS ERP system integrating HR, Finance, Supply Chain, Prism Analytics, and custom applications.
Landed
arXiv2025-11-21

Enhancing Multi-Camera Gymnast Tracking Through Domain Knowledge Integration

We present a robust multi-camera gymnast tracking, which has been applied at international gymnastics championships for gymnastics judging. Despite considerable progress in multi-camera tracking algorithms, tracking gymnasts presents unique challenges: (i) due to space restrictions, only a limited number of cameras can be installed in the gymnastics stadium; and (ii) due to variations in lighting, background, uniforms, and occlusions, multi-camera gymnast detection may fail in certain views and only provide valid detections from two opposing views. These factors complicate the accurate determination of a gymnast's 3D trajectory using conventional multi-camera triangulation. To alleviate this issue, we incorporate gymnastics domain knowledge into our tracking solution. Given that a gymnast's 3D center typically lies within a predefined vertical plane during \revised{much of their} performance, we can apply a ray-plane intersection to generate coplanar 3D trajectory candidates for opposing-view detections. More specifically, we propose a novel cascaded data association (DA) paradigm that employs triangulation to generate 3D trajectory candidates when cross-view detections are sufficient, and resort to the ray-plane intersection when they are insufficient. Consequently, coplanar candidates are used to compensate for uncertain trajectories, thereby minimizing tracking failures. The robustness of our method is validated through extensive experimentation, demonstrating its superiority over existing methods in challenging scenarios. Furthermore, our gymnastics judging system, equipped with this tracking method, has been successfully applied to recent Gymnastics World Championships, earning significant recognition from the International Gymnastics Federation.

cs.CV
Scenario Insightgymnastics judging system
用于国际体操锦标赛的多相机体操运动员跟踪和裁判系统
Landed
arXiv2025-11-20

RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning

Real-world robotic manipulation in homes and factories demands reliability, efficiency, and robustness that approach or surpass the performance of skilled human operators. We present RL-100, a real-world reinforcement learning framework built on diffusion-based visuomotor policies. RL-100 unifies imitation and reinforcement learning under a single PPO-style objective applied within the denoising process, yielding conservative and stable policy improvements across both offline and online stages. To meet deployment latency constraints, we employ a lightweight consistency distillation procedure that compresses multi-step diffusion into a one-step controller for high-frequency control. The framework is task-, embodiment-, and representation-agnostic, and supports both single-action outputs and action-chunking control. We evaluate RL-100 on seven diverse real-robot manipulation tasks, ranging from dynamic pushing and agile bowling to pouring, cloth folding, unscrewing, and multi-stage juicing. RL-100 attains 100% success across evaluated trials, achieving 900 out of 900 successful episodes, including up to 250 out of 250 consecutive trials on one task, and matches or surpasses expert teleoperators in time-to-completion. Without retraining, a single policy attains approximately 90% zero-shot success under environmental and dynamics shifts, adapts in a few-shot regime to significant task variations (86.7%), and remains robust to aggressive human perturbations (about 95%). In a public shopping-mall deployment, the juicing robot served random customers continuously for roughly seven hours without failure. Together, these results suggest a practical path toward deployment-ready robot learning: start from human priors, align training objectives with human-grounded metrics, and reliably extend performance beyond human demonstrations.

cs.ROcs.AIcs.LG
Scenario Insightreal-world robotic manipulation (e.g., juicing robot)
家用/工厂机器人操作任务,包括动态推物、倒汁、折布、拧螺丝、多阶段榨汁等,并在购物中心公共部署中服务客户。
Landed
arXiv2025-11-20
IFAB Foundation

Weather Maps as Tokens: Transformers for Renewable Energy Forecasting

Accurate renewable energy forecasting is essential to reduce dependence on fossil fuels and enabling grid decarbonization. However, current approaches fail to effectively integrate the rich spatial context of weather patterns with their temporal evolution. This work introduces a novel approach that treats weather maps as tokens in transformer sequences to predict renewable energy. Hourly weather maps are encoded as spatial tokens using a lightweight convolutional neural network, and then processed by a transformer to capture temporal dynamics across a 45-hour forecast horizon. Despite disadvantages in input initialization, evaluation against ENTSO-E operational forecasts shows a reduction in RMSE of about 60% and 20% for wind and solar respectively. A live dashboard showing daily forecasts is available at: https://www.sardiniaforecast.ifabfoundation.it.

cs.LG
Scenario InsightRenewable Energy Forecasting (Wind and Solar)
使用天气地图作为tokens的transformer模型预测风能和太阳能发电量。
Landed
arXiv2025-11-20
Coinbase

Cluster-based Adaptive Retrieval: Dynamic Context Selection for RAG Applications

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by pulling in external material, document, code, manuals, from vast and ever-growing corpora, to effectively answer user queries. The effectiveness of RAG depends significantly on aligning the number of retrieved documents with query characteristics: narrowly focused queries typically require fewer, highly relevant documents, whereas broader or ambiguous queries benefit from retrieving more extensive supporting information. However, the common static top-k retrieval approach fails to adapt to this variability, resulting in either insufficient context from too few documents or redundant information from too many. Motivated by these challenges, we introduce Cluster-based Adaptive Retrieval (CAR), an algorithm that dynamically determines the optimal number of documents by analyzing the clustering patterns of ordered query-document similarity distances. CAR detects the transition point within similarity distances, where tightly clustered, highly relevant documents shift toward less pertinent candidates, establishing an adaptive cut-off that scales with query complexity. On Coinbase's CDP corpus and the public MultiHop-RAG benchmark, CAR consistently picks the optimal retrieval depth and achieves the highest TES score, outperforming every fixed top-k baseline. In downstream RAG evaluations, CAR cuts LLM token usage by 60%, trims end-to-end latency by 22%, and reduces hallucinations by 10% while fully preserving answer relevance. Since integrating CAR into Coinbase's virtual assistant, we've seen user engagement jump by 200%.

cs.IRcs.AIcs.CL
Scenario InsightCoinbase's virtual assistant
虚拟助手中使用 Retrieval-Augmented Generation (RAG) 的检索增强功能
LLM token usage-60%end-to-end latency-22%
Landed
arXiv2025-11-20

Scalable and Efficient Large-Scale Log Analysis with LLMs: An IT Software Support Case Study

IT environments typically have logging mechanisms to monitor system health and detect issues. However, the huge volume of generated logs makes manual inspection impractical, highlighting the importance of automated log analysis in IT Software Support. In this paper, we propose a log analytics tool that leverages Large Language Models (LLMs) for log data processing and issue diagnosis, enabling the generation of automated insights and summaries. We further present a novel approach for efficiently running LLMs on CPUs to process massive log volumes in minimal time without compromising output quality. We share the insights and lessons learned from deployment of the tool - in production since March 2024 - scaled across 70 software products, processing over 2000 tickets for issue diagnosis, achieving a time savings of 300+ man hours and an estimated $15,444 per month in manpower costs compared to the traditional log analysis practices.

cs.SEcs.AI
Scenario InsightIT Software Support Log Analysis
Automated log processing and issue diagnosis using LLMs for monitoring system health in IT environments across multiple software products.
man-hour savings300+manpower cost savings$15,444 per month
Landed
arXiv2025-11-20
Capital One

FinTRec: Transformer Based Unified Contextual Ads Targeting and Personalization for Financial Applications

Transformer-based architectures are widely adopted in sequential recommendation systems, yet their application in Financial Services (FS) presents distinct practical and modeling challenges for real-time recommendation. These include:a) long-range user interactions (implicit and explicit) spanning both digital and physical channels generating temporally heterogeneous context, b) the presence of multiple interrelated products require coordinated models to support varied ad placements and personalized feeds, while balancing competing business goals. We propose FinTRec, a transformer-based framework that addresses these challenges and its operational objectives in FS. While tree-based models have traditionally been preferred in FS due to their explainability and alignment with regulatory requirements, our study demonstrate that FinTRec offers a viable and effective shift toward transformer-based architectures. Through historic simulation and live A/B test correlations, we show FinTRec consistently outperforms the production-grade tree-based baseline. The unified architecture, when fine-tuned for product adaptation, enables cross-product signal sharing, reduces training cost and technical debt, while improving offline performance across all products. To our knowledge, this is the first comprehensive study of unified sequential recommendation modeling in FS that addresses both technical and business considerations.

cs.LG
Scenario InsightFinancial Services Ads Targeting and Personalization
Transformer-based unified contextual ads targeting and personalization for financial applications, handling long-range user interactions across digital/physical channels and multiple interrelated products for ad placements and personalized feeds.
Landed
arXiv2025-11-20
Meta Platforms

SilverTorch: A Unified Model-based System to Democratize Large-Scale Recommendation on GPUs

Serving deep learning based recommendation models (DLRM) at scale is challenging. Existing systems rely on CPU-based ANN indexing and filtering services, suffering from non-negligible costs and forgoing joint optimization opportunities. Such inefficiency makes them difficult to support more complex model architectures, such as learned similarities and multi-task retrieval. In this paper, we propose SilverTorch, a model-based system for serving recommendation models on GPUs. SilverTorch unifies model serving by replacing standalone indexing and filtering services with layers of served models. We propose a Bloom index algorithm on GPUs for feature filtering and a tensor-native fused Int8 ANN kernel on GPUs for nearest neighbor search. We further co-design the ANN search index and filtering index to reduce GPU memory utilization and eliminate unnecessary computation. Benefit from SilverTorch's serving paradigm, we introduce a OverArch scoring layer and a Value Model to aggregate results across multi-tasks. These advancements improve the accuracy for retrieval and enable future studies for serving more complex models. For ranking, SilverTorch's design accelerates item embedding calculation by caching the pre-calculated embeddings inside the serving model. Our evaluation on the industry-scale datasets show that SilverTorch achieves up to 5.6x lower latency and 23.7x higher throughput compared to the state-of-the-art approaches. We also demonstrate that SilverTorch's solution is 13.35x more cost-efficient than CPU-based solution while improving accuracy via serving more complex models. SilverTorch serves over hundreds of models online across major products and recommends contents for billions of daily active users.

cs.IR
Scenario InsightLarge-scale recommendation systems
服务大规模深度学习推荐模型(DLRM),用于向数十亿日活用户推荐内容,覆盖主要产品。
Landed
arXiv2025-11-20
Kuaishou

CroPS: Improving Dense Retrieval with Cross-Perspective Positive Samples in Short-Video Search

Dense retrieval has become a foundational paradigm in modern search systems, especially on short-video platforms. However, most industrial systems adopt a self-reinforcing training pipeline that relies on historically exposed user interactions for supervision. This paradigm inevitably leads to a filter bubble effect, where potentially relevant but previously unseen content is excluded from the training signal, biasing the model toward narrow and conservative retrieval. In this paper, we present CroPS (Cross-Perspective Positive Samples), a novel retrieval data engine designed to alleviate this problem by introducing diverse and semantically meaningful positive examples from multiple perspectives. CroPS enhances training with positive signals derived from user query reformulation behavior (query-level), engagement data in recommendation streams (system-level), and world knowledge synthesized by large language models (knowledge-level). To effectively utilize these heterogeneous signals, we introduce a Hierarchical Label Assignment (HLA) strategy and a corresponding H-InfoNCE loss that together enable fine-grained, relevance-aware optimization. Extensive experiments conducted on Kuaishou Search, a large-scale commercial short-video search platform, demonstrate that CroPS significantly outperforms strong baselines both offline and in live A/B tests, achieving superior retrieval performance and reducing query reformulation rates. CroPS is now fully deployed in Kuaishou Search, serving hundreds of millions of users daily.

cs.IRcs.CL
Scenario InsightKuaishou Search
大型商业短视频搜索平台
Landed
arXiv2025-11-20
Tongji University, University of Oxford

Infrastructuring Pop-Up Cities with "Social Layer": Designing Serendipitous Co-Livings for Temporary Intentional Communities

After the pandemic, a new form of "pop-up city" has emerged -- co-living gatherings of 100-200 people for 4-8 weeks that differ from conferences and hack houses. These temporary intentional communities leverages existing urban infrastructure, blending daily life (housing, meals, care) with self-organized activities like learning, creating, and socializing. They coordinate bottom-up programming through an "unconference" system for identity, calendaring, RSVP, and social discovery that fosters spontaneous, serendipitous, enduring ties. This paper examines the design of "Social Layer," an unconferencing system for pop-up cities. We studied its real-world deployment in ShanHaiWoo (Jilin, China, 2023), muChiangmai (Chiangmai, Thailand, 2023), Edge Esmeralda, Edge Esmeralda (Healdsburg, CA, USA, 2024), Aleph (Buenos Aires, Argentina, 2024), and Gathering of Tribe (Lisbon, Portugal, 2024). Our findings distill: (1) the strong concept "scaffolded spontaneity" -- infrastructural affordances that balance structure with openness, amplifying participant agency while maintaining privacy and lightweight governance; (2) design implications for design researchers working on pop-up cities.

cs.HCcs.CY
Scenario InsightSocial Layer
Unconferencing system for pop-up cities, supporting identity, calendaring, RSVP, and social discovery to foster serendipitous ties in temporary co-living communities.
Landed
arXiv2025-11-20
Snowflake Inc.

Cortex AISQL: A Production SQL Engine for Unstructured Data

Snowflake's Cortex AISQL is a production SQL engine that integrates native semantic operations directly into SQL. This integration allows users to write declarative queries that combine relational operations with semantic reasoning, enabling them to query both structured and unstructured data effortlessly. However, making semantic operations efficient at production scale poses fundamental challenges. Semantic operations are more expensive than traditional SQL operations, possess distinct latency and throughput characteristics, and their cost and selectivity are unknown during query compilation. Furthermore, existing query engines are not designed to optimize semantic operations. The AISQL query execution engine addresses these challenges through three novel techniques informed by production deployment data from Snowflake customers. First, AI-aware query optimization treats AI inference cost as a first-class optimization objective, reasoning about large language model (LLM) cost directly during query planning to achieve 2-8$\times$ speedups. Second, adaptive model cascades reduce inference costs by routing most rows through a fast proxy model while escalating uncertain cases to a powerful oracle model, achieving 2-6$\times$ speedups while maintaining 90-95% of oracle model quality. Third, semantic join query rewriting lowers the quadratic time complexity of join operations to linear through reformulation as multi-label classification tasks, achieving 15-70$\times$ speedups with often improved prediction quality. AISQL is deployed in production at Snowflake, where it powers diverse customer workloads across analytics, search, and content understanding.

cs.DBcs.AIcs.LG
Scenario InsightCortex AISQL
A production SQL engine that integrates semantic operations into SQL for querying structured and unstructured data.
Landed
arXiv2025-11-20
412 Food Rescue

RescueLens: LLM-Powered Triage and Action on Volunteer Feedback for Food Rescue

Food rescue organizations simultaneously tackle food insecurity and waste by working with volunteers to redistribute food from donors who have excess to recipients who need it. Volunteer feedback allows food rescue organizations to identify issues early and ensure volunteer satisfaction. However, food rescue organizations monitor feedback manually, which can be cumbersome and labor-intensive, making it difficult to prioritize which issues are most important. In this work, we investigate how large language models (LLMs) assist food rescue organizers in understanding and taking action based on volunteer experiences. We work with 412 Food Rescue, a large food rescue organization based in Pittsburgh, Pennsylvania, to design RescueLens, an LLM-powered tool that automatically categorizes volunteer feedback, suggests donors and recipients to follow up with, and updates volunteer directions based on feedback. We evaluate the performance of RescueLens on an annotated dataset, and show that it can recover 96% of volunteer issues at 71% precision. Moreover, by ranking donors and recipients according to their rates of volunteer issues, RescueLens allows organizers to focus on 0.5% of donors responsible for more than 30% of volunteer issues. RescueLens is now deployed at 412 Food Rescue and through semi-structured interviews with organizers, we find that RescueLens streamlines the feedback process so organizers better allocate their time.

cs.CYcs.LG
Scenario InsightRescueLens
LLM-powered tool that categorizes volunteer feedback, suggests follow-ups with donors and recipients, and updates volunteer directions for food rescue operations.
volunteer issues recovery rate96%precision71%
Landed
arXiv2025-11-20

Driving with Regulation: Trustworthy and Interpretable Decision-Making for Autonomous Driving with Retrieval-Augmented Reasoning

Understanding and adhering to traffic regulations is essential for autonomous vehicles to ensure safety and trustworthiness. However, traffic regulations are complex, context-dependent, and differ between regions, posing a major challenge to conventional rule-based decision-making approaches. We present an interpretable, regulation-aware decision-making framework, DriveReg, which enables autonomous vehicles to understand and adhere to region-specific traffic laws and safety guidelines. The framework integrates a Retrieval-Augmented Generation (RAG)-based Traffic Regulation Retrieval Agent, which retrieves relevant rules from regulatory documents based on the current situation, and a Large Language Model (LLM)-powered Reasoning Agent that evaluates actions for legal compliance and safety. Our design emphasizes interpretability to enhance transparency and trustworthiness. To support systematic evaluation, we introduce the DriveReg Scenarios Dataset, a comprehensive dataset of driving scenarios across Boston, Singapore, and Los Angeles, with both hypothesized text-based cases and real-world driving data, constructed and annotated to evaluate models' capacity for regulation understanding and reasoning. We validate our framework on the DriveReg Scenarios Dataset and real-world deployment, demonstrating strong performance and robustness across diverse environments.

cs.AI
Scenario InsightAutonomous Driving
Decision-making framework for autonomous vehicles to understand and adhere to region-specific traffic regulations and safety guidelines.
Landed
arXiv2025-11-19
Xiaomi

DevPiolt: Operation Recommendation for IoT Devices at Xiaomi Home

Operation recommendation for IoT devices refers to generating personalized device operations for users based on their context, such as historical operations, environment information, and device status. This task is crucial for enhancing user satisfaction and corporate profits. Existing recommendation models struggle with complex operation logic, diverse user preferences, and sensitive to suboptimal suggestions, limiting their applicability to IoT device operations. To address these issues, we propose DevPiolt, a LLM-based recommendation model for IoT device operations. Specifically, we first equip the LLM with fundamental domain knowledge of IoT operations via continual pre-training and multi-task fine-tuning. Then, we employ direct preference optimization to align the fine-tuned LLM with specific user preferences. Finally, we design a confidence-based exposure control mechanism to avoid negative user experiences from low-quality recommendations. Extensive experiments show that DevPiolt significantly outperforms baselines on all datasets, with an average improvement of 69.5% across all metrics. DevPiolt has been practically deployed in Xiaomi Home app for one quarter, providing daily operation recommendations to 255,000 users. Online experiment results indicate a 21.6% increase in unique visitor device coverage and a 29.1% increase in page view acceptance rates.

cs.AIcs.LG
Scenario InsightXiaomi Home
Operation recommendation for IoT devices in the Xiaomi Home app
unique visitor device coverage+21.6%page view acceptance rates+29.1%
Landed
arXiv2025-11-19
Alibaba Group (Taobao / Alimama)

MOON Embedding: Multimodal Representation Learning for E-commerce Search Advertising

We introduce MOON, our comprehensive set of sustainable iterative practices for multimodal representation learning for e-commerce applications. MOON has already been fully deployed across all stages of Taobao search advertising system, including retrieval, relevance, ranking, and so on. The performance gains are particularly significant on click-through rate (CTR) prediction task, which achieves an overall +20.00% online CTR improvement. Over the past three years, this project has delivered the largest improvement on CTR prediction task and undergone five full-scale iterations. Throughout the exploration and iteration of our MOON, we have accumulated valuable insights and practical experience that we believe will benefit the research community. MOON contains a three-stage training paradigm of "Pretraining, Post-training, and Application", allowing effective integration of multimodal representations with downstream tasks. Notably, to bridge the misalignment between the objectives of multimodal representation learning and downstream training, we define the exchange rate to quantify how effectively improvements in an intermediate metric can translate into downstream gains. Through this analysis, we identify the image-based search recall as a critical intermediate metric guiding the optimization of multimodal models. Over three years and five iterations, MOON has evolved along four critical dimensions: data processing, training strategy, model architecture, and downstream application. The lessons and insights gained through the iterative improvements will also be shared. As part of our exploration into scaling effects in the e-commerce field, we further conduct a systematic study of the scaling laws governing multimodal representation learning, examining multiple factors such as the number of training tokens, negative samples, and the length of user behavior sequences.

cs.IRcs.AIcs.CV
Scenario InsightTaobao search advertising system
E-commerce search advertising system covering retrieval, relevance, ranking, and CTR prediction stages.
CTR+20.00%
Landed
arXiv2025-11-19

Zipf-Gramming: Scaling Byte N-Grams Up to Production Sized Malware Corpora

A classifier using byte n-grams as features is the only approach we have found fast enough to meet requirements in size (sub 2 MB), speed (multiple GB/s), and latency (sub 10 ms) for deployment in numerous malware detection scenarios. However, we've consistently found that 6-8 grams achieve the best accuracy on our production deployments but have been unable to deploy regularly updated models due to the high cost of finding the top-k most frequent n-grams over terabytes of executable programs. Because the Zipfian distribution well models the distribution of n-grams, we exploit its properties to develop a new top-k n-gram extractor that is up to $35\times$ faster than the previous best alternative. Using our new Zipf-Gramming algorithm, we are able to scale up our production training set and obtain up to 30\% improvement in AUC at detecting new malware. We show theoretically and empirically that our approach will select the top-k items with little error and the interplay between theory and engineering required to achieve these results.

cs.CRcs.LGcs.MS
Scenario InsightMalware Detection
Production deployments for malware detection using byte n-gram classifiers.
Landed
arXiv2025-11-19

Introducing AI to an Online Petition Platform Changed Outputs but not Outcomes

The rapid integration of AI writing tools into online platforms raises critical questions about their impact on content production and outcomes. We leverage a unique natural experiment on Change.org, a leading social advocacy platform, to causally investigate the effects of an in-platform ''write with AI'' tool. To understand the impact of the AI integration, we collected 1.5 million petitions and employed a difference-in-differences analysis. Our findings reveal that in-platform AI access significantly altered the lexical features of petitions and increased petition homogeneity, but did not improve petition outcomes. We confirmed the results in a separate analysis of repeat petition writers who wrote petitions before and after introduction of the AI tool. The results suggest that while AI writing tools can profoundly reshape online content, their practical utility for improving desired outcomes may be less beneficial than anticipated, and introduce unintended consequences like content homogenization.

cs.CY
Scenario InsightChange.org
Online petition platform for social advocacy
Landed
arXiv2025-11-19

Deploying Rapid Damage Assessments from sUAS Imagery for Disaster Response

This paper presents the first AI/ML system for automating building damage assessment in uncrewed aerial systems (sUAS) imagery to be deployed operationally during federally declared disasters (Hurricanes Debby and Helene). In response to major disasters, sUAS teams are dispatched to collect imagery of the affected areas to assess damage; however, at recent disasters, teams collectively delivered between 47GB and 369GB of imagery per day, representing more imagery than can reasonably be transmitted or interpreted by subject matter experts in the disaster scene, thus delaying response efforts. To alleviate this data avalanche encountered in practice, computer vision and machine learning techniques are necessary. While prior work has been deployed to automatically assess damage in satellite imagery, there is no current state of practice for sUAS-based damage assessment systems, as all known work has been confined to academic settings. This work establishes the state of practice via the development and deployment of models for building damage assessment with sUAS imagery. The model development involved training on the largest known dataset of post-disaster sUAS aerial imagery, containing 21,716 building damage labels, and the operational training of 91 disaster practitioners. The best performing model was deployed during the responses to Hurricanes Debby and Helene, where it assessed a combined 415 buildings in approximately 18 minutes. This work contributes documentation of the actual use of AI/ML for damage assessment during a disaster and lessons learned to the benefit of the AI/ML research and user communities.

cs.CVcs.AIcs.CY
Scenario InsightsUAS Imagery Damage Assessment for Disaster Response
使用小型无人机(sUAS)图像自动化评估建筑物损坏,应用于飓风等联邦宣布灾害响应。
Landed
arXiv2025-11-19

GAEA: Experiences and Lessons Learned from a Country-Scale Environmental Digital Twin

This paper describes the experiences and lessons learned after the deployment of a country-scale environmental digital twin on the island of Cyprus for three years. This digital twin, called GAEA, contains 27 environmental geospatial services and is suitable for urban planners, policymakers, farmers, property owners, real-estate and forestry professionals, as well as insurance companies and banks that have properties in their portfolio. This paper demonstrates the power, potential, current and future challenges of geospatial analytics and environmental digital twins on a large scale.

cs.CYcs.AIcs.ET
Scenario InsightGAEA
Country-scale environmental digital twin on the island of Cyprus containing 27 environmental geospatial services for urban planners, policymakers, farmers, property owners, real-estate and forestry professionals, insurance companies, and banks.
Landed
arXiv2025-11-19
JetBrains

In-IDE Programming Courses: Learning Software Development in a Real-World Setting

While learning programming languages is crucial for software engineers, mastering the necessary tools is equally important. To facilitate this, JetBrains recently released the JetBrains Academy plugin, which customizes the IDE for learners, allowing tutors to create courses entirely within IDE. In this work, we provide the first exploratory study of this learning format. We carried out eight one-hour interviews with students and developers who completed at least one course using the plugin, inquiring about their experience with the format, the used IDE features, and the current shortcomings. Our results indicate that learning inside the IDE is overall welcomed by the learners, allowing them to study in a more realistic setting, using features such as debugging and code analysis, which are crucial for real software development. With the collected results and the analysis of the current drawbacks, we aim to contribute to teaching students more practical skills.

cs.SEcs.CYcs.HC
Scenario InsightJetBrains Academy plugin
Plugin that customizes the IDE for learners, enabling tutors to create programming courses entirely within the IDE for real-world software development learning.
Landed
arXiv2025-11-19
Meituan

From Reasoning LLMs to BERT: A Two-Stage Distillation Framework for Search Relevance

Query-service relevance prediction in e-commerce search systems faces strict latency requirements that prevent the direct application of Large Language Models (LLMs). To bridge this gap, we propose a two-stage reasoning distillation framework to transfer reasoning capabilities from a powerful teacher LLM to a lightweight, deployment-friendly student model. In the first stage, we address the limitations of general-purpose LLMs by constructing a domain-adapted teacher model. This is achieved through a three-step process: domain-adaptive pre-training to inject platform knowledge, supervised fine-tuning to elicit reasoning skills, and preference optimization with a multi-dimensional reward model to ensure the generation of reliable and preference-aligned reasoning paths. This teacher can then automatically annotate massive query-service pairs from search logs with both relevance labels and reasoning chains. In the second stage, to address the challenges of architectural heterogeneity in standard distillation, we introduce Contrastive Reasoning Self-Distillation (CRSD). By modeling the behavior of the same student model under "standard" and "reasoning-augmented" inputs as a teacher-student relationship, CRSD enables the lightweight model to internalize the teacher's complex decision-making mechanisms without needing the explicit reasoning path at inference. Offline evaluations and online A/B testing in the Meituan search advertising system demonstrate that our framework achieves significant improvements across multiple metrics, validating its effectiveness and practical value.

cs.IRcs.AI
Scenario InsightMeituan search advertising system
电商搜索系统中的 query-service relevance prediction,特别是搜索广告场景
Landed
arXiv2025-11-19
Taobao & Tmall Group of Alibaba

TaoSearchEmb: A Multi-Objective Reinforcement Learning Framework for Dense Retrieval in Taobao Search

Dense retrieval, as the core component of e-commerce search engines, maps user queries and items into a unified semantic space through pre-trained embedding models to enable large-scale real-time semantic retrieval. Despite the rapid advancement of LLMs gradually replacing traditional BERT architectures for embedding, their training paradigms still adhere to BERT-like supervised fine-tuning and hard negative mining strategies. This approach relies on complex offline hard negative sample construction pipelines, which constrain model iteration efficiency and hinder the evolutionary potential of semantic representation capabilities. Besides, existing multi-task learning frameworks face the seesaw effect when simultaneously optimizing semantic relevance and non-relevance objectives. In this paper, we propose Retrieval-GRPO, a multi-objective reinforcement learning-based dense retrieval framework designed to address these challenges. The method eliminates offline hard negative sample construction by dynamically retrieving Top-K candidate products for each query during training, while introducing a relevance LLM as a reward model to generate real-time feedback. Specifically, the retrieval model dynamically optimizes embedding representations through reinforcement learning, with reward signals combining LLM-generated relevance scores, product quality scores, and multi-way exclusivity metrics to achieve multi-objective user preference alignment and real-time error correction. This mechanism not only removes dependency on hard negatives but also mitigates the seesaw effect through collaborative multi-objective optimization, significantly enhancing the model's semantic generalization capability for complex long-tail queries. Extensive offline and online experiments validate the effectiveness of Retrieval-GRPO, which has been deployed on China's largest e-commerce platform.

cs.IR
Scenario InsightTaobao Search
Dense retrieval component in Taobao's e-commerce search engine.
Landed
arXiv2025-11-19
Purdue University

Making Evidence Actionable in Adaptive Learning

Adaptive learning often diagnoses precisely yet intervenes weakly, yielding help that is mistimed or misaligned. This study presents evidence supporting an instructor-governed feedback loop that converts concept-level assessment evidence into vetted micro-interventions. The adaptive learning algorithm contains three safeguards: adequacy as a hard guarantee of gap closure, attention as a budgeted constraint for time and redundancy, and diversity as protection against overfitting to a single resource. We formalize intervention assignment as a binary integer program with constraints for coverage, time, difficulty windows informed by ability estimates, prerequisites encoded by a concept matrix, and anti-redundancy enforced through diversity. Greedy selection serves low-richness and tight-latency regimes, gradient-based relaxation serves rich repositories, and a hybrid method transitions along a richness-latency frontier. In simulation and in an introductory physics deployment with one thousand two hundred four students, both solvers achieved full skill coverage for essentially all learners within bounded watch time. The gradient-based method reduced redundant coverage by approximately twelve percentage points relative to greedy and harmonized difficulty across slates, while greedy delivered comparable adequacy with lower computational cost in scarce settings. Slack variables localized missing content and supported targeted curation, sustaining sufficiency across subgroups. The result is a tractable and auditable controller that closes the diagnostic-pedagogical loop and delivers equitable, load-aware personalization at classroom scale.

cs.AIcs.CEstat.AP
Scenario InsightAdaptive learning for introductory physics
Instructor-governed feedback loop converting concept-level assessments into micro-interventions in physics education.
Landed
arXiv2025-11-18

Small Models, Big Support: A Local LLM Framework for Educator-Centric Content Creation and Assessment with RAG and CAG

While Large Language Models (LLMs) are increasingly applied in student-facing educational tools, their potential to directly support educators through locally deployable and customizable solutions remains underexplored. Many existing approaches rely on proprietary, cloud-based systems that raise significant cost, privacy, and control concerns for educational institutions. To address these barriers, we introduce an end-to-end, open-source framework that empowers educators using small (3B-7B parameter), locally deployable LLMs. Our system is designed for comprehensive teacher support, including customized teaching material generation and AI-assisted assessment. The framework synergistically combines Retrieval-Augmented Generation (RAG) and Context-Augmented Generation (CAG) to produce factually accurate, pedagogically-styled content. A core feature is an interactive refinement loop, a teacher-in-the-loop mechanism that ensures educator agency and precise alignment of the final output. To enhance reliability and safety, an auxiliary verifier LLM inspects all generated content. We validate our framework through a rigorous evaluation of its content generation capabilities and report on a successful technical deployment in a college physics course, which confirms its feasibility on standard institutional hardware. Our findings demonstrate that carefully engineered, self-hosted systems built on small LLMs can provide robust, affordable, and private support for educators, achieving practical utility comparable to much larger models for targeted instructional tasks. This work presents a practical blueprint for the development of sovereign AI tools tailored to the real-world needs of educational institutions.

cs.CYcs.AI
Scenario InsightLocal LLM Framework for Educator-Centric Content Creation and Assessment
Open-source framework using small LLMs with RAG and CAG for generating teaching materials and AI-assisted assessment, including interactive refinement and verifier.
Landed
arXiv2025-11-18

Coordinated Humanoid Robot Locomotion with Symmetry Equivariant Reinforcement Learning Policy

The human nervous system exhibits bilateral symmetry, enabling coordinated and balanced movements. However, existing Deep Reinforcement Learning (DRL) methods for humanoid robots neglect morphological symmetry of the robot, leading to uncoordinated and suboptimal behaviors. Inspired by human motor control, we propose Symmetry Equivariant Policy (SE-Policy), a new DRL framework that embeds strict symmetry equivariance in the actor and symmetry invariance in the critic without additional hyperparameters. SE-Policy enforces consistent behaviors across symmetric observations, producing temporally and spatially coordinated motions with higher task performance. Extensive experiments on velocity tracking tasks, conducted in both simulation and real-world deployment with the Unitree G1 humanoid robot, demonstrate that SE-Policy improves tracking accuracy by up to 40% compared to state-of-the-art baselines, while achieving superior spatial-temporal coordination. These results demonstrate the effectiveness of SE-Policy and its broad applicability to humanoid robots.

cs.RO
Scenario InsightUnitree G1 humanoid robot
人形机器人协调运动,特别是速度跟踪任务
tracking accuracy+40%
Landed
arXiv2025-11-18
ByteDance

LEMUR: Large scale End-to-end MUltimodal Recommendation

Traditional ID-based recommender systems often struggle with cold-start and generalization challenges. Multimodal recommendation systems, which leverage textual and visual data, offer a promising solution to mitigate these issues. However, existing industrial approaches typically adopt a two-stage training paradigm: first pretraining a multimodal model, then applying its frozen representations to train the recommendation model. This decoupled framework suffers from misalignment between multimodal learning and recommendation objectives, as well as an inability to adapt dynamically to new data. To address these limitations, we propose LEMUR, the first large-scale multimodal recommender system trained end-to-end from raw data. By jointly optimizing both the multimodal and recommendation components, LEMUR ensures tighter alignment with downstream objectives while enabling real-time parameter updates. Constructing multimodal sequential representations from user history often entails prohibitively high computational costs. To alleviate this bottleneck, we propose a novel memory bank mechanism that incrementally accumulates historical multimodal representations throughout the training process. After one month of deployment in Douyin Search, LEMUR has led to a 0.843% reduction in query change rate decay and a 0.81% improvement in QAUC. Additionally, LEMUR has shown significant gains across key offline metrics for Douyin Advertisement. Our results validate the superiority of end-to-end multimodal recommendation in real-world industrial scenarios.

cs.IR
Scenario InsightDouyin Search and Douyin Advertisement
Multimodal recommendation deployed in Douyin (ByteDance's short-video app) search and advertisement systems.
query change rate decay-0.843%QAUC+0.81%
Landed
arXiv2025-11-18
Harvard Medical School

Ken Utilization Layer: Hebbian Replay Within a Student's Ken for Adaptive Exercise Recommendation

Adaptive exercise recommendation (ER) aims to choose the next activity that matches a learner's evolving Zone of Proximal Development (ZPD). We present KUL-Rec, a biologically inspired ER system that couples a fast Hebbian memory with slow replay-based consolidation to enable continual, few-shot personalization from sparse interactions. The model operates in an embedding space, allowing a single architecture to handle both tabular knowledge-tracing logs and open-ended short-answer text. We align evaluation with tutoring needs using bidirectional ranking and rank-sensitive metrics (nDCG, Recall@K). Across ten public datasets, KUL-Rec improves macro nDCG (0.316 vs. 0.265 for the strongest baseline) and Recall@10 (0.305 vs. 0.211), while achieving low inference latency and an $\approx99$\% reduction in peak GPU memory relative to a competitive graph-based model. In a 13-week graduate course, KUL-Rec personalized weekly short-answer quizzes generated by a retrieval-augmented pipeline and the personalized quizzes were associated with lower perceived difficulty and higher helpfulness (p < .05). An embedding robustness audit highlights that encoder choice affects semantic alignment, motivating routine audits when deploying open-response assessment. Together, these results indicate that Hebbian replay with bounded consolidation offers a practical path to real-time, interpretable ER that scales across data modalities and classroom settings.

cs.CYcs.AIcs.LG
Scenario InsightAdaptive exercise recommendation in education
Recommending personalized exercises and quizzes to learners based on their Zone of Proximal Development in educational settings like graduate courses.
Landed
arXiv2025-11-18

IndiTag: An Online Media Bias Analysis System Using Fine-Grained Bias Indicators

In the age of information overload and polarized discourse, understanding media bias has become imperative for informed decision-making and fostering a balanced public discourse. However, without the experts' analysis, it is hard for the readers to distinguish bias from the news articles. This paper presents IndiTag, an innovative online media bias analysis system that leverages fine-grained bias indicators to dissect and distinguish bias in digital content. IndiTag offers a novel approach by incorporating large language models, bias indicators, and vector database to detect and interpret bias automatically. Complemented by a user-friendly interface facilitating automated bias analysis for readers, IndiTag offers a comprehensive platform for in-depth bias examination. We demonstrate the efficacy and versatility of IndiTag through experiments on four datasets encompassing news articles from diverse platforms. Furthermore, we discuss potential applications of IndiTag in fostering media literacy, facilitating fact-checking initiatives, and enhancing the transparency and accountability of digital media platforms. IndiTag stands as a valuable tool in the pursuit of fostering a more informed, discerning, and inclusive public discourse in the digital age. We release an online system for end users and the source code is available at https://github.com/lylin0/IndiTag.

cs.CY
Scenario InsightIndiTag
An online media bias analysis system that uses fine-grained bias indicators, large language models, and vector databases to detect and interpret bias in news articles.
Landed
arXiv2025-11-18
SambaNova

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators

The proliferation of 100B+ parameter Large Language Models (LLMs) with 100k+ context length support have resulted in increasing demands for on-chip memory to support large KV caches. Techniques such as StreamingLLM and SnapKV demonstrate how to control KV cache size while maintaining model accuracy. Yet, these techniques are not commonly used within industrial deployments using frameworks like vLLM or SGLang. The reason is twofold: on one hand, the static graphs and continuous batching methodology employed by these frameworks make it difficult to admit modifications to the standard multi-head attention algorithm, while on the other hand, the accuracy implications of such techniques on modern instruction-following and reasoning models are not well understood, obfuscating the need for implementing these techniques. In this paper, we explore these accuracy implications on Llama-3.1-8B-Instruct and DeepSeek-R1, and develop SnapStream, a KV cache compression method that can be deployed at scale. We demonstrate the efficacy of SnapStream in a 16-way tensor-parallel deployment of DeepSeek-671B on SambaNova SN40L accelerators running at 128k context length and up to 1832 tokens per second in a real production setting. SnapStream enables $4\times$ improved on-chip memory usage and introduces minimal accuracy degradation on LongBench-v2, AIME24 and LiveCodeBench. To the best of our knowledge, this is the first implementation of sparse KV attention techniques deployed in a production inference system with static graphs and continuous batching.

cs.AIcs.ARcs.DC
Scenario InsightLLM inference serving
Large-scale serving of long-context LLMs (e.g., DeepSeek-671B at 128k context) using dataflow accelerators with continuous batching and static graphs.
Landed
arXiv2025-11-18
Ben-Gurion University of the Negev

A GPU-Accelerated RAG-Based Telegram Assistant for Supporting Parallel Processing Students

This project addresses a critical pedagogical need: offering students continuous, on-demand academic assistance beyond conventional reception hours. I present a domain-specific Retrieval-Augmented Generation (RAG) system powered by a quantized Mistral-7B Instruct model and deployed as a Telegram bot. The assistant enhances learning by delivering real-time, personalized responses aligned with the "Introduction to Parallel Processing" course materials. GPU acceleration significantly improves inference latency, enabling practical deployment on consumer hardware. This approach demonstrates how consumer GPUs can enable affordable, private, and effective AI tutoring for HPC education.

cs.CYcs.AI
Scenario InsightTelegram Assistant for Parallel Processing Students
A domain-specific RAG-based Telegram bot providing real-time academic assistance for the 'Introduction to Parallel Processing' course.
Landed
arXiv2025-11-18
Harvard Library

GRIN Transfer: A production-ready tool for libraries to retrieve digital copies from Google Books

Publicly launched in 2004, the Google Books project has scanned tens of millions of items in partnership with libraries around the world. As part of this project, Google created the Google Return Interface (GRIN). Through this platform, libraries can access their scanned collections, the associated metadata, and the ongoing OCR and metadata improvements that become available as Google reprocesses these collections using new technologies. When downloading the Harvard Library Google Books collection from GRIN to develop the Institutional Books dataset, we encountered several challenges related to rate-limiting and atomized metadata within the GRIN platform. To overcome these challenges and help other libraries make more robust use of their Google Books collections, this technical report introduces the initial release of GRIN Transfer. This open-source and production-ready Python pipeline allows partner libraries to efficiently retrieve their Google Books collections from GRIN. This report also introduces an updated version of our Institutional Books 1.0 pipeline, initially used to analyze, augment, and assemble the Institutional Books 1.0 dataset. We have revised this pipeline for compatibility with the output format of GRIN Transfer. A library could pair these two tools to create an end-to-end processing pipeline for their Google Books collection to retrieve, structure, and enhance data available from GRIN. This report gives an overview of how GRIN Transfer was designed to optimize for reliability and usability in different environments, as well as guidance on configuration for various use cases.

cs.DLcs.IR
Scenario InsightGRIN Transfer
Open-source Python pipeline for libraries to retrieve, process, and enhance their Google Books collections from the GRIN platform.
Landed
arXiv2025-11-18

JobSphere: An AI-Powered Multilingual Career Copilot for Government Employment Platforms

Users of government employment websites commonly face engagement and accessibility challenges linked to navigational complexity, a dearth of language options, and a lack of personalized support. This paper introduces JobSphere, an AI-powered career assistant that is redefining the employment platform in Punjab called PGRKAM. JobSphere employs Retrieval-Augmented Generation (RAG) architecture, and it is multilingual, available in English, Hindi and Punjabi. JobSphere technique uses 4-bit quantization, allowing the platform to deploy on consumer-grade GPUs (i.e., NVIDIA RTX 3050 4GB), making the implementation 89% cheaper than that of cloud-based systems. Key innovations include voice-enabled interaction with the assistant, automated mock tests, resume parsing with skills recognition, and embed-based job recommendation that achieves a precision@10 score of 68%. An evaluation of JobSphere's implementation reveals 94% factual accuracy, a median response time of 1.8 seconds, and a System Usability Scale score of 78.5/100, a 50% improvement compared to the baseline PGRKAM platform context. In conclusion, JobSphere effectively fills significant accessibility gaps for Punjab/Hindi-speaking users in rural locations, while also affirming the users access to trusted job content provided by government agencies.

cs.AIcs.CY
Scenario InsightPGRKAM
Government employment platform in Punjab featuring JobSphere AI career copilot.
Landed
arXiv2025-11-18
Alibaba Group

From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns - a stark contrast to the smooth, predictable gains seen in large language models. We identify the root cause as a structural misalignment: Transformers assume sequential compositionality, while CTR data demand combinatorial reasoning over high-cardinality semantic fields. Unstructured attention spreads capacity indiscriminately, amplifying noise under extreme sparsity and breaking scalable learning. To restore alignment, we introduce the Field-Aware Transformer (FAT), which embeds field-based interaction priors into attention through decomposed content alignment and cross-field modulation. This design ensures model complexity scales with the number of fields F, not the total vocabulary size n >> F, leading to tighter generalization and, critically, observed power-law scaling in AUC as model width increases. We present the first formal scaling law for CTR models, grounded in Rademacher complexity, that explains and predicts this behavior. On large-scale benchmarks, FAT improves AUC by up to +0.51% over state-of-the-art methods. Deployed online, it delivers +2.33% CTR and +0.66% RPM. Our work establishes that effective scaling in recommendation arises not from size, but from structured expressivity-architectural coherence with data semantics.

cs.IRcs.LG
Scenario InsightCTR Prediction
Click-through rate prediction in recommendation systems, involving high-cardinality fields typical of large-scale advertising or e-commerce scenarios.
CTR+2.33%RPM+0.66%
Landed
arXiv2025-11-18
Tianjin University

RAGPulse: An Open-Source RAG Workload Trace to Optimize RAG Serving Systems

Retrieval-Augmented Generation (RAG) is a critical paradigm for building reliable, knowledge-intensive Large Language Model (LLM) applications. However, the multi-stage pipeline (retrieve, generate) and unique workload characteristics (e.g., knowledge dependency) of RAG systems pose significant challenges for serving performance optimization. Existing generic LLM inference traces fail to capture these RAG-specific dynamics, creating a significant performance gap between academic research and real-world deployment. To bridge this gap, this paper introduces RAGPulse, an open-source RAG workload trace dataset. This dataset was collected from an university-wide Q&A system serving that has served more than 40,000 students and faculties since April 2024. We detail RAGPulse's system architecture, its privacy-preserving hash-based data format, and provide an in-depth statistical analysis. Our analysis reveals that real-world RAG workloads exhibit significant temporal locality and a highly skewed hot document access pattern. RAGPulse provides a high-fidelity foundation for researchers to develop and validate novel optimization strategies for RAG systems, such as content-aware batching and retrieval caching, ultimately enhancing the efficiency and reliability of RAG services. The code is available at https://github.com/flashserve/RAGPulse.

cs.LGcs.DB
Scenario Insightuniversity-wide Q&A system
A RAG-based Q&A system serving over 40,000 students and faculties.
Landed
arXiv2025-11-18
Allen Institute for AI

OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation

Earth observation data presents a unique challenge: it is spatial like images, sequential like video or text, and highly multimodal. We present OlmoEarth: a multimodal, spatio-temporal foundation model that employs a novel self-supervised learning formulation, masking strategy, and loss all designed for the Earth observation domain. OlmoEarth achieves state-of-the-art performance compared to 12 other foundation models across a variety of research benchmarks and real-world tasks from external partners. When evaluating embeddings OlmoEarth achieves the best performance on 15 out of 24 tasks, and with full fine-tuning it is the best on 19 of 29 tasks. We deploy OlmoEarth as the backbone of an end-to-end platform for data collection, labeling, training, and inference of Earth observation models. The OlmoEarth Platform puts frontier foundation models and powerful data management tools into the hands of non-profits and NGOs working to solve the world's biggest problems. OlmoEarth source code, training data, and pre-trained weights are available at $\href{https://github.com/allenai/olmoearth_pretrain}{\text{https://github.com/allenai/olmoearth_pretrain}}$.

cs.CVcs.LG
Scenario InsightOlmoEarth Platform
End-to-end platform for data collection, labeling, training, and inference of Earth observation models for non-profits and NGOs.
Landed
arXiv2025-11-17
LinkedIn

An External Fairness Evaluation of LinkedIn Talent Search

We conduct an independent, third-party audit for bias of LinkedIn's Talent Search ranking system, focusing on potential ranking bias across two attributes: gender and race. To do so, we first construct a dataset of rankings produced by the system, collecting extensive Talent Search results across a diverse set of occupational queries. We then develop a robust labeling pipeline that infers the two demographic attributes of interest for the returned users. To evaluate potential biases in the collected dataset of real-world rankings, we utilize two exposure disparity metrics: deviation from group proportions and MinSkew. Our analysis reveals an under-representation of minority groups in early ranks across many queries. We further examine potential causes of this disparity, and discuss why they may be difficult or, in some cases, impossible to fully eliminate among the early ranks of queries. Beyond static metrics, we also investigate the concept of subgroup fairness over time, highlighting temporal disparities in exposure and retention, which are often more difficult to audit for in practice. In employer recruiting platforms such as LinkedIn Talent Search, the persistence of a particular candidate over multiple days in the ranking can directly impact the probability that the given candidate is selected for opportunities. Our analysis reveals demographic disparities in this temporal stability, with some groups experiencing greater volatility in their ranked positions than others. We contextualize all our findings alongside LinkedIn's published self-audits of its Talent Search system and reflect on the methodological constraints of a black-box external evaluation, including limited observability and noisy demographic inference.

cs.CYcs.SI
Scenario InsightTalent Search
招聘平台的人才搜索排名系统,用于返回职业查询结果。
Landed
arXiv2025-11-17

GovScape: A Public Multimodal Search System for 70 Million Pages of Government PDFs

Efforts over the past three decades have produced web archives containing billions of webpage snapshots and petabytes of data. The End of Term Web Archive alone contains, among other file types, millions of PDFs produced by the federal government. While preservation with web archives has been successful, significant challenges for access and discoverability remain. For example, current affordances for browsing the End of Term PDFs are limited to downloading and browsing individual PDFs, as well as performing basic keyword search across them. In this paper, we introduce GovScape, a public search system that supports multimodal searches across 10,015,993 federal government PDFs from the 2020 End of Term crawl (70,958,487 total PDF pages) - to our knowledge, all renderable PDFs in the 2020 crawl that are 50 pages or under. GovScape supports four primary forms of search over these 10 million PDFs: in addition to providing (1) filter conditions over metadata facets including domain and crawl date and (2) exact text search against the PDF text, we provide (3) semantic text search and (4) visual search against the PDFs across individual pages, enabling users to structure queries such as "redacted documents" or "pie charts." We detail the constituent components of GovScape, including the search affordances, embedding pipeline, system architecture, and open source codebase. Significantly, the total estimated compute cost for GovScape's pre-processing pipeline for 10 million PDFs was approximately $1,500, equivalent to 47,000 PDF pages per dollar spent on compute, demonstrating the potential for immediate scalability. Accordingly, we outline steps that we have already begun pursuing toward multimodal search at the 100+ million PDF scale. GovScape can be found at https://www.govscape.net.

cs.IRcs.DL
Scenario InsightGovScape
Public multimodal search system for over 70 million pages of federal government PDFs from the End of Term Web Archive.
Landed
arXiv2025-11-17

Align$^3$GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation

Large Language Models (LLMs) demonstrate significant advantages in leveraging structured world knowledge and multi-step reasoning capabilities. However, fundamental challenges arise when transforming LLMs into real-world recommender systems due to semantic and behavioral misalignment. To bridge this gap, we propose Align$^3$GR, a novel framework that unifies token-level, behavior modeling-level, and preference-level alignment. Our approach introduces: Dual tokenization fusing user-item semantic and collaborative signals. Enhanced behavior modeling with bidirectional semantic alignment. Progressive DPO strategy combining self-play (SP-DPO) and real-world feedback (RF-DPO) for dynamic preference adaptation. Experiments show Align$^3$GR outperforms the SOTA baseline by +17.8% in Recall@10 and +20.2% in NDCG@10 on the public dataset, with significant gains in online A/B tests and full-scale deployment on an industrial large-scale recommendation platform.

cs.IR
Scenario Insightindustrial large-scale recommendation platform
基于 LLM 的生成式推荐系统
Landed
arXiv2025-11-14
MOSTLY AI

Democratizing Tabular Data Access with an Open$\unicode{x2013}$Source Synthetic$\unicode{x2013}$Data SDK

Machine learning development critically depends on access to high-quality data. However, increasing restrictions due to privacy, proprietary interests, and ethical concerns have created significant barriers to data accessibility. Synthetic data offers a viable solution by enabling safe, broad data usage without compromising sensitive information. This paper presents the MOSTLY AI Synthetic Data Software Development Kit (SDK), an open-source toolkit designed specifically for synthesizing high-quality tabular data. The SDK integrates robust features such as differential privacy guarantees, fairness-aware data generation, and automated quality assurance into a flexible and accessible Python interface. Leveraging the TabularARGN autoregressive framework, the SDK supports diverse data types and complex multi-table and sequential datasets, delivering competitive performance with notable improvements in speed and usability. Currently deployed both as a cloud service and locally installable software, the SDK has seen rapid adoption, highlighting its practicality in addressing real-world data bottlenecks and promoting widespread data democratization.

cs.LG
Scenario InsightMOSTLY AI Synthetic Data SDK
Open-source Python toolkit for synthesizing high-quality tabular data, supporting differential privacy, fairness, multi-table datasets, and quality assurance.
Landed
arXiv2025-11-14

A Brain Cell Type Resource Created by Large Language Models and a Multi-Agent AI System for Collaborative Community Annotation

Single-cell RNA sequencing has transformed our ability to identify diverse cell types and their transcriptomic signatures. However, annotating these signatures-especially those involving poorly characterized genes-remains a major challenge. Traditional methods, such as Gene Set Enrichment Analysis (GSEA), depend on well-curated annotations and often perform poorly in these contexts. Large Language Models (LLMs) offer a promising alternative but struggle to represent complex biological knowledge within structured ontologies. To address this, we present BRAINCELL-AID (BRAINCELL-AID: https://biodataai.uth.edu/BRAINCELL-AID), a novel multi-agent AI system that integrates free-text descriptions with ontology labels to enable more accurate and robust gene set annotation. By incorporating retrieval-augmented generation (RAG), we developed a robust agentic workflow that refines predictions using relevant PubMed literature, reducing hallucinations and enhancing interpretability. Using this workflow, we achieved correct annotations for 77% of mouse gene sets among their top predictions. Applying this approach, we annotated 5,322 brain cell clusters from the comprehensive mouse brain cell atlas generated by the BRAIN Initiative Cell Census Network, enabling novel insights into brain cell function by identifying region-specific gene co-expression patterns and inferring functional roles of gene ensembles. BRAINCELL-AID also identifies Basal Ganglia-related cell types with neurologically meaningful descriptions. Hence, we create a valuable resource to support community-driven cell type annotation.

cs.AI
Scenario InsightBRAINCELL-AID
Multi-agent AI system for annotating gene sets in single-cell RNA sequencing data, focused on brain cell types from the mouse brain cell atlas.
Landed
arXiv2025-11-14
Carnegie Mellon University

Does AI-Assisted Coding Deliver? A Difference-in-Differences Study of Cursor's Impact on Software Projects

Large language models (LLMs) have demonstrated the promise to revolutionize the field of software engineering. Among other things, LLM agents are rapidly gaining momentum in their application to software development, with practitioners claiming a multifold productivity increase after adoption. Yet, empirical evidence is lacking around these claims. In this paper, we estimate the causal effect of adopting a widely popular LLM agent assistant, namely Cursor, on development velocity and software quality. The estimation is enabled by a state-of-the-art difference-in-differences design comparing Cursor-adopting GitHub projects with a matched control group of similar GitHub projects that do not use Cursor. We find that the adoption of Cursor leads to a significant, large, but transient increase in project-level development velocity, along with a significant and persistent increase in static analysis warnings and code complexity. Further panel generalized method of moments estimation reveals that the increase in static analysis warnings and code complexity acts as a major factor causing long-term velocity slowdown. Our study carries implications for software engineering practitioners, LLM agent assistant designers, and researchers.

cs.SEcs.AI
Scenario InsightCursor
LLM agent assistant for software development in GitHub projects
Landed
arXiv2025-11-14
Purdue University

Owlgorithm: Supporting Self-Regulated Learning in Competitive Programming through LLM-Driven Reflection

We present Owlgorithm, an educational platform that supports Self-Regulated Learning (SRL) in competitive programming (CP) through AI-generated reflective questions. Leveraging GPT-4o, Owlgorithm produces context-aware, metacognitive prompts tailored to individual student submissions. Integrated into a second- and third-year CP course, the system-provided reflective prompts adapted to student outcomes: guiding deeper conceptual insight for correct solutions and structured debugging for partial or failed ones. Our exploratory assessment of student ratings and TA feedback revealed both promising benefits and notable limitations. While many found the generated questions useful for reflection and debugging, concerns were raised about feedback accuracy and classroom usability. These results suggest advantages of LLM-supported reflection for novice programmers, though refinements are needed to ensure reliability and pedagogical value for advanced learners. From our experience, several key insights emerged: GenAI can effectively support structured reflection, but careful prompt design, dynamic adaptation, and usability improvements are critical to realizing their potential in education. We offer specific recommendations for educators using similar tools and outline next steps to enhance Owlgorithm's educational impact. The underlying framework may also generalize to other reflective learning contexts.

cs.CYcs.AIcs.HC
Scenario InsightOwlgorithm
支持竞技编程自调节学习(SRL)的教育平台
Landed
arXiv2025-11-14
Tencent Inc.

GPR: Towards a Generative Pre-trained One-Model Paradigm for Large-Scale Advertising Recommendation

As an intelligent infrastructure connecting users with commercial content, advertising recommendation systems play a central role in information flow and value creation within the digital economy. However, existing multi-stage advertising recommendation systems suffer from objective misalignment and error propagation, making it difficult to achieve global optimality, while unified generative recommendation models still struggle to meet the demands of practical industrial applications. To address these issues, we propose GPR (Generative Pre-trained Recommender), the first one-model framework that redefines advertising recommendation as an end-to-end generative task, replacing the traditional cascading paradigm with a unified generative approach. To realize GPR, we introduce three key innovations spanning unified representation, network architecture, and training strategy. First, we design a unified input schema and tokenization method tailored to advertising scenarios, mapping both ads and organic content into a shared multi-level semantic ID space, thereby enhancing semantic alignment and modeling consistency across heterogeneous data. Second, we develop the Heterogeneous Hierarchical Decoder (HHD), a dual-decoder architecture that decouples user intent modeling from ad generation, achieving a balance between training efficiency and inference flexibility while maintaining strong modeling capacity. Finally, we propose a multi-stage joint training strategy that integrates Multi-Token Prediction (MTP), Value-Aware Fine-Tuning and the Hierarchy Enhanced Policy Optimization (HEPO) algorithm, forming a complete generative recommendation pipeline that unifies interest modeling, value alignment, and policy optimization. GPR has been fully deployed in the Tencent Weixin Channels advertising system, delivering significant improvements in key business metrics including GMV and CTCVR.

cs.IR
Scenario InsightWeixin Channels advertising system
腾讯微信渠道广告系统,用于连接用户与商业内容的广告推荐。
Landed
arXiv2025-11-13
Thumbtack

Beyond the Hype: Embeddings vs. Prompting for Multiclass Classification Tasks

Are traditional classification approaches irrelevant in this era of AI hype? We show that there are multiclass classification problems where predictive models holistically outperform LLM prompt-based frameworks. Given text and images from home-service project descriptions provided by Thumbtack customers, we build embeddings-based softmax models that predict the professional category (e.g., handyman, bathroom remodeling) associated with each problem description. We then compare against prompts that ask state-of-the-art LLM models to solve the same problem. We find that the embeddings approach outperforms the best LLM prompts in terms of accuracy, calibration, latency, and financial cost. In particular, the embeddings approach has 49.5\% higher accuracy than the prompting approach, and its superiority is consistent across text-only, image-only, and text-image problem descriptions. Furthermore, it yields well-calibrated probabilities, which we later use as confidence signals to provide contextualized user experience during deployment. On the contrary, prompting scores are overly uninformative. Finally, the embeddings approach is 14 and 81 times faster than prompting in processing images and text respectively, while under realistic deployment assumptions, it can be up to 10 times cheaper. Based on these results, we deployed a variation of the embeddings approach, and through A/B testing we observed performance consistent with our offline analysis. Our study shows that for multiclass classification problems that can leverage proprietary datasets, an embeddings-based approach may yield unequivocally better results. Hence, scientists, practitioners, engineers, and business leaders can use our study to go beyond the hype and consider appropriate predictive models for their classification use cases.

cs.LGcs.AIcs.CL
Scenario InsightThumbtack家居服务项目分类
基于客户提供的家居服务项目描述(文本和图像)预测专业类别,如handyman、bathroom remodeling。
Landed
arXiv2025-11-13
Taobao & Tmall Group of Alibaba

TaoSR-AGRL: Adaptive Guided Reinforcement Learning Framework for E-commerce Search Relevance

Query-product relevance prediction is fundamental to e-commerce search and has become even more critical in the era of AI-powered shopping, where semantic understanding and complex reasoning directly shape the user experience and business conversion. Large Language Models (LLMs) enable generative, reasoning-based approaches, typically aligned via supervised fine-tuning (SFT) or preference optimization methods like Direct Preference Optimization (DPO). However, the increasing complexity of business rules and user queries exposes the inability of existing methods to endow models with robust reasoning capacity for long-tail and challenging cases. Efforts to address this via reinforcement learning strategies like Group Relative Policy Optimization (GRPO) often suffer from sparse terminal rewards, offering insufficient guidance for multi-step reasoning and slowing convergence. To address these challenges, we propose TaoSR-AGRL, an Adaptive Guided Reinforcement Learning framework for LLM-based relevance prediction in Taobao Search Relevance. TaoSR-AGRL introduces two key innovations: (1) Rule-aware Reward Shaping, which decomposes the final relevance judgment into dense, structured rewards aligned with domain-specific relevance criteria; and (2) Adaptive Guided Replay, which identifies low-accuracy rollouts during training and injects targeted ground-truth guidance to steer the policy away from stagnant, rule-violating reasoning patterns toward compliant trajectories. TaoSR-AGRL was evaluated on large-scale real-world datasets and through online side-by-side human evaluations on Taobao Search. It consistently outperforms DPO and standard GRPO baselines in offline experiments, improving relevance accuracy, rule adherence, and training stability. The model trained with TaoSR-AGRL has been successfully deployed in the main search scenario on Taobao, serving hundreds of millions of users.

cs.IRcs.AIcs.CL
Scenario InsightTaobao Search
淘宝电商搜索相关性预测场景
Landed
arXiv2025-11-13
University of Washington

Systems for Scaling Accessibility Efforts in Large Computing Courses

It is critically important to make computing courses accessible for disabled students. This is particularly challenging in large computing courses, which face unique challenges due to the sheer scale of course content and staff. In this experience report, we share our attempts to scale accessibility efforts for a large university-level introductory programming course sequence, with over 3500 enrolled students and 100 teaching assistants (TAs) per year. First, we introduce our approach to auditing and remediating course materials by systematically identifying and resolving accessibility issues. However, remediating content post-hoc is purely reactive and scales poorly. We then discuss two approaches to systems that enable proactive accessibility work. We developed technical systems to manage remediation complexity at scale: redesigning other course content to be web-first and accessible by default, providing alternate accessible views for existing course content, and writing automated tests to receive instant feedback on a subset of accessibility issues. Separately, we established human systems to empower both course staff and students in accessibility best practices: developing and running various TA-targeted accessibility trainings, establishing course-wide accessibility norms, and integrating accessibility topics into core course curriculum. Preliminary qualitative feedback from both staff and students shows increased engagement in accessibility work and accessible technologies. We close by discussing limitations and lessons learned from our work, with advice for others developing similar auditing, remediation, technical, or human systems.

cs.CY
Scenario Insight大型大学编程入门课程
华盛顿大学introductory programming course sequence,超过3500名学生和100名TA
Landed
arXiv2025-11-13
Google

AI-generated podcasts: Synthetic Intimacy and Cultural Translation in NotebookLM's Audio Overviews

This paper analyses AI-generated podcasts produced by Google's NotebookLM, which generates audio podcasts with two chatty AI hosts discussing whichever documents a user uploads. While AI-generated podcasts have been discussed as tools, for instance in medical education, they have not yet been analysed as media. By uploading different types of text and analysing the generated outputs I show how the podcasts' structure is built around a fixed template. I also find that NotebookLM not only translates texts from other languages into a perky standardised Mid-Western American accent, it also translates cultural contexts to a white, educated, middle-class American default. This is a distinct development in how publics are shaped by media, marking a departure from the multiple public spheres that scholars have described in human podcasting from the early 2000s until today, where hosts spoke to specific communities and responded to listener comments, to an abstraction of the podcast genre.

cs.CYcs.AIcs.CL
Scenario InsightNotebookLM
生成AI主持人的音频播客,基于用户上传文档讨论内容
Landed
arXiv2025-11-13

DeepDR: an integrated deep-learning model web server for drug repositioning

Background: Identifying new indications for approved drugs is a complex and time-consuming process that requires extensive knowledge of pharmacology, clinical data, and advanced computational methods. Recently, deep learning (DL) methods have shown their capability for the accurate prediction of drug repositioning. However, implementing DL-based modeling requires in-depth domain knowledge and proficient programming skills. Results: In this application, we introduce DeepDR, the first integrated platform that combines a variety of established DL-based models for disease- and target-specific drug repositioning tasks. DeepDR leverages invaluable experience to recommend candidate drugs, which covers more than 15 networks and a comprehensive knowledge graph that includes 5.9 million edges across 107 types of relationships connecting drugs, diseases, proteins/genes, pathways, and expression from six existing databases and a large scientific corpus of 24 million PubMed publications. Additionally, the recommended results include detailed descriptions of the recommended drugs and visualize key patterns with interpretability through a knowledge graph. Conclusion: DeepDR is free and open to all users without the requirement of registration. We believe it can provide an easy-to-use, systematic, highly accurate, and computationally automated platform for both experimental and computational scientists.

cs.LGq-bio.QM
Scenario InsightDeepDR
药物再定位的深度学习模型集成web服务器
Landed
arXiv2025-11-13
Shanghai AI Laboratory

Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots

Data scaling has long remained a critical bottleneck in robot learning. For humanoid robots, human videos and motion data are abundant and widely available, offering a free and large-scale data source. Besides, the semantics related to the motions enable modality alignment and high-level robot control learning. However, how to effectively mine raw video, extract robot-learnable representations, and leverage them for scalable learning remains an open problem. To address this, we introduce Humanoid-Union, a large-scale dataset generated through an autonomous pipeline, comprising over 260 hours of diverse, high-quality humanoid robot motion data with semantic annotations derived from human motion videos. The dataset can be further expanded via the same pipeline. Building on this data resource, we propose SCHUR, a scalable learning framework designed to explore the impact of large-scale data on high-level control in humanoid robots. Experimental results demonstrate that SCHUR achieves high robot motion generation quality and strong text-motion alignment under data and model scaling, with 37\% reconstruction improvement under MPJPE and 25\% alignment improvement under FID comparing with previous methods. Its effectiveness is further validated through deployment in real-world humanoid robot.

cs.RO
Scenario Insight人形机器人高级控制
使用大规模人类视频生成的人形机器人运动数据,进行高层次控制学习,包括运动生成和文本-运动对齐。