Distributed Systems are commonly built using a set of standard assumptions: we assume that message delays are unbounded, that any packet can be lost in the network, and that clocks cannot be closely synchronized. On the one hand, these conservative assumptions result in robust systems that can operate reliably in a wide variety of conditions. On the other hand, they also force the system to do a lot of complex ad-hoc coordination and thus limit the performance it can achieve. In this paper, we take a look at what lies beyond this standard model. We observe that, on modern hardware in a single-tenant data center, distributed systems are able to closely coordinate and essentially “run like clockwork” with very little effort. If we are willing to additionally rule out some worst-case failure scenarios, this results in a large performance improvement, both in practice and even in theory. We demonstrate this effect using state-machine replication (SMR) as a case study: our SMR protocol, Watchmaker, exceeds the throughput of state-of-the-art algorithms by two orders of magnitude, and it requires only half as many replicas to tolerate the same number of faults.
@article{lu2026nineswatch,title={Running Distributed Systems Like Clockwork},author={Newatia, Karan and Gifford, Robert and Lu, Qingjie and Haeberlen, Andreas and Phan, Linh Thi Xuan},journal={Proceedings of the 1st New Ideas in Networked Systems: (NiNES '26)},year={2026},doi={10.4230/OASIcs.NINeS.2026.26}}
arXiv
Characterizing Metastable Faults and Failures
Ali
Farahbakhsh, Qingjie
Lu, Lorenzo
Alvisi, and
2 more authors
Metastable failures are hard to detect, prevent, and mitigate. During a metastable failure, a system exhibits self-sustaining bad behavior even in the absence of adversarial conditions. Prior work focuses on symptoms and has portrayed metastable failures as instances of self-sustaining overload. This characterization leaves the underlying failure causes and dynamics unknown, and does not account for metastable failures that do not manifest as overload. We present the first causal characterization of metastable failures by identifying their origin in metastable faults, i.e., structural destabilizing cycles of interaction among systems components that, in isolation, are stabilizing. Metastable failures arise when scheduling decisions let these destabilizing interactions gain the upper hand over the individual components’ stabilizing tendencies. We then derive a methodology to predict metastable failures, and to build metastable-fault-tolerant (MFT) systems. We apply our methodology to three case studies, showcasing the generality of our results.
@article{lu2026metastablefaults,title={Characterizing Metastable Faults and Failures},author={Farahbakhsh, Ali and Lu, Qingjie and Alvisi, Lorenzo and Haeberlen, Andreas and van Renesse, Robbert},journal={arXiv preprint arXiv:2606.00942},year={2026},doi={10.48550/arXiv.2606.00942}}
2025
HotNets ’25
Modeling Metastability
Ali
Farahbakhsh, Andreas
Haeberlen, Qingjie
Lu, and
3 more authors
Proceedings of the 24th ACM Workshop on Hot Topics in Networks (HotNets ’25), 2025
Recently, there has been increasing concern about a new failure mode in data-center systems: when there is an external shock, such as a sudden load spike or some machine failures, systems will sometimes respond with reduced throughput - but, in contrast to a traditional overload situation, the throughput does not recover once the external shock disappears, and remains permanently degraded. This phenomenon has been called a metastable failure. In this paper, we sketch a simple model that could help to explain how and why metastability arises. We also show how our model can be used to predict the presence or absence of metastable states in a given system.
@article{lu2025modeling,title={Modeling Metastability},author={Farahbakhsh, Ali and Haeberlen, Andreas and Lu, Qingjie and Alvisi, Lorenzo and van Renesse, Robbert and Gahtan, Shir Cohen},journal={Proceedings of the 24th ACM Workshop on Hot Topics in Networks (HotNets '25)},year={2025},doi={10.1145/3772356.3772426},note={<b>Author Names in Alphabetic Order</b>}}
2022
ISET
Neural Network-Based Approaches for Aspect-Based Sentiment Analysis
Qingjie
Lu
Highlights in Science, Engineering and Technology, 2022
Proceedings of the 4th International Conference on Information Science and Electronic Technology
The research of Aspect-based Sentiment Analysis which is a process that has a more specific focus than general sentiment analysis is trending upwards in numbers. Stemming from Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs), novel approaches introduced new components like Graph Convolutional Networks (GCNs) and Transformers that improved the overall accuracy dramatically. Along with summarizing the models, the focus of this survey will be on comparing the several novel methods. Although this paper found that Dependency graph enhanced dual-transformer network (DGEDT) coupled with Bidirectional Encoder Representations from Transformers (BERT) is the best performing model thus far, this paper also identified challenges that needed to be addressed in order to better evaluate current and future models.
@article{lu2022aspectbased,title={Neural Network-Based Approaches for Aspect-Based Sentiment Analysis},author={Lu, Qingjie},journal={Highlights in Science, Engineering and Technology},volume={12},pages={222--229},year={2022},doi={10.54097/hset.v12i.1457},url={https://drpress.org/ojs/index.php/HSET/article/view/1457},note={Proceedings of the 4th International Conference on Information Science and Electronic Technology}}