云计算、AI、云原生、大数据等一站式技术学习平台

网站首页 > 教程文章 正文

阿里开源的分布式事务揭秘:Seata原理及流程剖析

jxf315 2025-02-07 17:54:55 教程文章 17 ℃

背景

在分布式系统中,分布式事务是一个必须要解决的问题,目前使用较多的是最终一致性方案。今天带来的这篇,就给大家分析一下Seata的源码是如何一步一步实现的。读源码的时候我们需要俯瞰起全貌,不要去扣一个一个的细节,这样我们学习起来会快捷而且有效率,我们学习源码需要掌握的是整体思路和核心点。


一、分布式事务简介

分布式事务有各种实现方案,不过大体可分为两类,一种不需要关注事务分支与全局事务的交互过程。另一种是将逻辑拆分成三个部分准备、提交、回滚,分支事务加入全局事务。这两种在Seata里前者称为AT模式,后者称为MT模式。


二、内容

1.分布式事务数据库操作型
MySQL XA方案 是其中一种,这种的话是直接作用于数据库。
其中RM执行本地事务提交与回滚;TM是分布式事务核心管理。
缺点的话,一是不适用于微服务,二很明显由于每次操作不提交,最后导致数据未提交越来越多时候,性能就不是很好了。那么像Seata这种业务层面的解决就相对而已来说性能强大了很多。


2.Seata分布式事务详解
Seata分布式事务是业务层民的解决方案。
而且只依赖于单台数据的事务能力。
Seata包括三个role:
1> TC 事务协调,负责协调并驱动全局事务的提交与回滚
2> TM 控制全局事务边界,负责开启全局事务,决定全局事务的提交与回滚
3> RM 分支事务,负责分支注册,接收TC指令,驱动分支事务回滚与提交
自己理解后画了个菜鸟图,如下:

3.XA vs Seata AT

AT不需要XA协议,适配于微服务,XA事务性资源的锁都要保持到第二阶段 完成才释放,AT分本地锁和全局锁,本地锁由本地事务管理,全局锁由全局事务管理,在决议第二阶段全局提交时,全局锁马上可以释放。


相关概念


  • XID:一个全局事务的唯一标识,由ip:port:sequence组成
  • Transaction Coordinator (TC): 事务协调器,维护全局事务的运行状态,负责协调并驱动全局事务的提交或回滚。
  • Transaction Manager ?: 控制全局事务的边界,负责开启一个全局事务,并最终发起全局提交或全局回滚的决议。
  • Resource Manager (RM): 控制分支事务,负责分支注册、状态汇报,并接收事务协调器的指令,驱动分支(本地)事务的提交和回滚。


原理

seata涉及到三个角色之间的交互,本文通过流程图将AT模式下的基本交互流程梳理一下,为我们以后的解析打下基础。
假设有三个微服务,分别是服务A、B、C,其中服务A中调用了服务B和服务C,TM、TC、RM三者之间的交互流程如下图:

  1. 1、服务A启动时,GlobalTransactionScanner会对有@GlobalTransaction注解的方法进行AOP增强,并生成代理,增强的代码位于GlobalTransactionalInterceptor类中,当调用@GlobalTransaction注解的方法时,增强代码首先向TC注册全局事务,表示全局事务的开始,同时TC生成XID,并返回给TM;
  2. 2、服务A中调用服务B时,将XID传递给服务B;
  3. 3、服务B得到XID后,访问TC,注册分支事务,并从TC获得分支事务ID,TC根据XID将分支事务与全局事务关联;
  4. 4、接下来服务B开始执行SQL语句,在执行前将表中对应的数据保存一份,执行后在保存一份,将这两份记录作为回滚记录写入到数据库中,如果执行过程中没有异常,服务B最后将事务提交,并通知TC分支事务成功,服务B也会清除本地事务数据;
  5. 5、服务A访问完服务B后,访问服务C;
  6. 6、服务C与TC之间的交互与服务B完全一致;
  7. 7、服务B和服务C都成功后,服务A通过TM通知TC全局事务成功,如果失败了,服务A也会通知TC全局事务失败;
  8. 8、TC记录了全局事务下的每个分支事务,TC收到全局事务的结果后,如果结果成功,则通知RM成功,RM收到通知后清理之前在数据库中保存的回滚记录,如果失败了,则RM要查询出之前在数据库保存的回滚记录,对之前的SQL操作进行回滚。
    因为TM、RM、TC之间的交互都是通过网络完成的,很容易出现网络断开的情况,因此TC提供了四个定时线程池,定时检测系统中是否有超时事务、异步提交事务、回滚重试事务、重试提交事务,如果发现了有这四类事务,则从全局事务中获取所有的分支事务,分别调用各个分支事务完成对应的操作,依次来确保事务的一致性。


需要考虑的问题:
通过上面流程的分析可以发现,每次SQL操作(查询除外)时,都会增加额外了三次数据库操作;每次全局事务和分支事务开启时,都涉及到TM、RM与TC的交互;全局事务期间还要承担数据短时不一致的情况,这些都是我们在使用AT模式需要考虑的情况。

项目依赖

seata使用XID表示一个分布式事务,XID需要在一次分布式事务请求所涉的系统中进行传递,从而向feacar-server发送分支事务的处理情况,以及接收feacar-server的commit、rollback指令。所以在分布式系统中使用seata要解决XID的传递问题。seata目前支持全版本的dubbo,对于spring cloud的分布式项目社区也提供了相应的实现



    org.springframework.cloud
    spring-cloud-alibaba-fescar

该组件实现了基于RestTemplate、Feign通讯时的XID传递功能,详细说明见集成源码深度剖析:Fescar x Spring Cloud


spring-cloud-alibaba-fescar内已包含了fescar-spring的依赖,所以可以不另外引入,查看完整的pom.xml.

业务逻辑

业务逻辑是经典的下订单、扣余额、减库存流程。根据模块划分为三个独立的服务,且分别连接对应的数据库

  1. 订单:order-server
  2. 账户:account-server
  3. 库存:storage-server

另外还有发起分布式事务的业务系统

  • 业务:business-server

项目结构如下图

配置文件

seata的配置文件入口为registry.conf查看代码ConfigurationFactory得知目前还不能指定该配置文件,所以名称只能为registry.conf


private static final String REGISTRY_CONF = "registry.conf";
    
public static final Configuration FILE_INSTANCE = new FileConfiguration(REGISTRY_CONF);

在registry中可以指定具体配置的形式,这里使用默认的file形式。在file.conf中有3部分配置内容


1.transport
transport部分的配置对应NettyServerConfig类,用于定义Netty的相关参与,client与server的通信使用的Netty
2.service


service {
 #vgroup->rgroup
 vgroup_mapping.my_test_tx_group = "default"
 #配置Client连接TC的地址
 default.grouplist = "127.0.0.1:8091"
 #degrade current not support
 enableDegrade = false
 #disable
 是否启用seata的分布式事务
 disableGlobalTransaction = false
} 


//部分代码
public class GlobalTransactionScanner{
    private final boolean disableGlobalTransaction =
           ConfigurationFactory.getInstance().getBoolean("service.disableGlobalTransaction", false);
    
    public void afterPropertiesSet() {
        if (disableGlobalTransaction) {
            if (LOGGER.isInfoEnabled()) {
                LOGGER.info("Global transaction is disabled.");
            }
            return;
        }
        initClient();
    }
}

3.client


 client {
  #RM接收TC的commit通知后缓冲上限
  async.commit.buffer.limit = 10000
  lock {
    retry.internal = 10
    retry.times = 30
  }
}


启动Server

前往
https://github.com/seata/seata/releases 下载最新版本的 Fescar Server

解压之后的 bin 目录,执行


./fescar-server.sh 8091 ../data

启动成功输出


2019-04-09 20:27:24.637 INFO [main]c.a.fescar.core.rpc.netty.AbstractRpcRemotingServer.start:152 -Server started ... 

启动Client

对于Spring boot项目,启动运行xxxApplication的main方法即可,seata的加载入口类位于
GlobalTransactionAutoConfiguration


@Configuration
@EnableConfigurationProperties({FescarProperties.class})
public class GlobalTransactionAutoConfiguration {
    private final ApplicationContext applicationContext;
    private final FescarProperties fescarProperties;

    public GlobalTransactionAutoConfiguration(ApplicationContext applicationContext, FescarProperties fescarProperties) {
        this.applicationContext = applicationContext;
        this.fescarProperties = fescarProperties;
    }

    @Bean
    public GlobalTransactionScanner globalTransactionScanner() {
        String applicationName = this.applicationContext.getEnvironment().getProperty("spring.application.name");
        String txServiceGroup = this.fescarProperties.getTxServiceGroup();
        if (StringUtils.isEmpty(txServiceGroup)) {
            txServiceGroup = applicationName + "-fescar-service-group";
            this.fescarProperties.setTxServiceGroup(txServiceGroup);
        }
        
        return new GlobalTransactionScanner(applicationName, txServiceGroup);
    }
}

可以看到支持一个配置项FescarProperties,用于配置事务分组名称


spring.cloud.alibaba.fescar.tx-service-group=my_test_tx_group

如果不指定则用spring.application.name+ -fescar-service-group生成一个名称,所以不指定spring.application.name启动会报错


@ConfigurationProperties("spring.cloud.alibaba.fescar")
public class FescarProperties {
    private String txServiceGroup;

    public FescarProperties() {
    }

    public String getTxServiceGroup() {
        return this.txServiceGroup;
    }

    public void setTxServiceGroup(String txServiceGroup) {
        this.txServiceGroup = txServiceGroup;
    }
}

有了applicationId和txServiceGroup则创建GlobalTransactionScanner对象,主要看其中的initClient方法


private void initClient() {
    if (LOGGER.isInfoEnabled()) {
        LOGGER.info("Initializing Global Transaction Clients ... ");
    }
    if (StringUtils.isNullOrEmpty(applicationId) || StringUtils.isNullOrEmpty(txServiceGroup)) {
        throw new IllegalArgumentException(
            "applicationId: " + applicationId + ", txServiceGroup: " + txServiceGroup);
    }
    //init TM
    TMClient.init(applicationId, txServiceGroup);
    if (LOGGER.isInfoEnabled()) {
        LOGGER.info(
            "Transaction Manager Client is initialized. applicationId[" + applicationId + "] txServiceGroup["
                + txServiceGroup + "]");
    }
    //init RM
    RMClient.init(applicationId, txServiceGroup);
    if (LOGGER.isInfoEnabled()) {
        LOGGER.info("Resource Manager is initialized. applicationId[" + applicationId  + "] txServiceGroup["  + txServiceGroup + "]");
    }

    if (LOGGER.isInfoEnabled()) {
        LOGGER.info("Global Transaction Clients are initialized. ");
    }
}

可以看到初始化了TMClient和RMClient,对于一个服务既可以是TM角色也可以是RM角色,至于什么时候是TM或者RM则要看在一次全局事务中@GlobalTransactional注解标注在哪。

Client创建的结果是与TC的一个Netty连接,所以在启动日志中可以看到两个Netty Channel,其中也标明了transactionRole分别为TMROLE和RMROLE


NettyPool create channel to {"address":"127.0.0.1:8091","message":{"applicationId":"order-service","byteBuffer":{"char":"\u0000","direct":false,"double":0.0,"float":0.0,"int":0,"long":0,"readOnly":false,"short":0},"transactionServiceGroup":"hello-service-fescar-service-group","typeCode":101,"version":"0.4.0"},"transactionRole":"TMROLE"}
NettyPool create channel to {"address":"127.0.0.1:8091","message":{"applicationId":"order-service","byteBuffer":{"char":"\u0000","direct":false,"double":0.0,"float":0.0,"int":0,"long":0,"readOnly":false,"short":0},"resourceIds":"jdbc:mysql://127.0.0.1:3306/db_order?useSSL=false","transactionServiceGroup":"hello-service-fescar-service-group","typeCode":103,"version":"0.4.0"},"transactionRole":"RMROLE"}
Send:RegisterTMRequest{applicationId='order-service', transactionServiceGroup='hello-service-fescar-service-group'}
Send:RegisterRMRequest{resourceIds='jdbc:mysql://127.0.0.1:3306/db_order?useSSL=false', applicationId='order-service', transactionServiceGroup='hello-service-fescar-service-group'}
Receive:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null,messageId:2
Receive:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null,messageId:1
com.alibaba.fescar.core.rpc.netty.RmRpcClient@7904cd7c msgId:2, future :com.alibaba.fescar.core.protocol.MessageFuture@4107849f, body:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null
com.alibaba.fescar.core.rpc.netty.TmRpcClient@68609034 msgId:1, future :com.alibaba.fescar.core.protocol.MessageFuture@527cc144, body:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null
register success, cost 28 ms, version:0.4.1,role:TMROLE,channel:[id: 0xf45059d4, L:/127.0.0.1:63533 - R:/127.0.0.1:8091]
register RM success. server version:0.4.1,channel:[id: 0xb7674b6a, L:/127.0.0.1:63534 - R:/127.0.0.1:8091]
register success, cost 37 ms, version:0.4.1,role:RMROLE,channel:[id: 0xb7674b6a, L:/127.0.0.1:63534 - R:/127.0.0.1:8091]

日志中可以看到创建连接后,发送了注册请求,然后得到了结果相应,RmRpcClient、TmRpcClient成功实例化。


TM处理流程



在本例中,TM的角色是business-service,因为BusinessService的purchase方法标注了@GlobalTransactional


@Service
public class BusinessService {

    @Autowired
    private StorageFeignClient storageFeignClient;
    @Autowired
    private OrderFeignClient orderFeignClient;

    @GlobalTransactional
    public void purchase(String userId, String commodityCode, int orderCount){
        storageFeignClient.deduct(commodityCode, orderCount);

        orderFeignClient.create(userId, commodityCode, orderCount);
    }
}

GET请求
127.0.0.1:8084/purchaseuserId=1001&commodityCode=2001&orderCount=1看看会发生什么


全局事务开启
首先需要关注的是@GlobalTransactional注解的作用,它是在
GlobalTransactionalInterceptor中被拦截处理


//部分代码
public class GlobalTransactionalInterceptor implements MethodInterceptor {
    @Override
    public Object invoke(final MethodInvocation methodInvocation) throws Throwable {
        Class targetClass = (methodInvocation.getThis() != null ? AopUtils.getTargetClass(methodInvocation.getThis()) : null);
        Method specificMethod = ClassUtils.getMostSpecificMethod(methodInvocation.getMethod(), targetClass);
        final Method method = BridgeMethodResolver.findBridgedMethod(specificMethod);

        //获取方法GlobalTransactional注解
        final GlobalTransactional globalTransactionalAnnotation = getAnnotation(method, GlobalTransactional.class);
        final GlobalLock globalLockAnnotation = getAnnotation(method, GlobalLock.class);
        
        //如果方法有GlobalTransactional注解,则进行相应处理
        if (globalTransactionalAnnotation != null) {
            return handleGlobalTransaction(methodInvocation, globalTransactionalAnnotation);
        } else if (globalLockAnnotation != null) {
            return handleGlobalLock(methodInvocation);
        } else {
            return methodInvocation.proceed();
        }
    }
    
    //调用了TransactionalTemplate
    private Object handleGlobalTransaction(final MethodInvocation methodInvocation,
                                           final GlobalTransactional globalTrxAnno) throws Throwable {
        try {
            return transactionalTemplate.execute(new TransactionalExecutor() {
                @Override
                public Object execute() throws Throwable {
                    return methodInvocation.proceed();
                }

                @Override
                public int timeout() {
                    return globalTrxAnno.timeoutMills();
                }

                @Override
                public String name() {
                    String name = globalTrxAnno.name();
                    if (!StringUtils.isNullOrEmpty(name)) {
                        return name;
                    }
                    return formatMethod(methodInvocation.getMethod());
                }
            });
        } catch (TransactionalExecutor.ExecutionException e) {
            TransactionalExecutor.Code code = e.getCode();
            switch (code) {
                case RollbackDone:
                    throw e.getOriginalException();
                case BeginFailure:
                    failureHandler.onBeginFailure(e.getTransaction(), e.getCause());
                    throw e.getCause();
                case CommitFailure:
                    failureHandler.onCommitFailure(e.getTransaction(), e.getCause());
                    throw e.getCause();
                case RollbackFailure:
                    failureHandler.onRollbackFailure(e.getTransaction(), e.getCause());
                    throw e.getCause();
                default:
                    throw new ShouldNeverHappenException("Unknown TransactionalExecutor.Code: " + code);
            }
        }
    }
}

TransactionalTemplate定义了TM对全局事务处理的标准步骤,注释写的比较清楚了


public class TransactionalTemplate {
    public Object execute(TransactionalExecutor business) throws TransactionalExecutor.ExecutionException {

        // 1. get or create a transaction
        GlobalTransaction tx = GlobalTransactionContext.getCurrentOrCreate();

        try {

            // 2. begin transaction
            try {
                triggerBeforeBegin();
                tx.begin(business.timeout(), business.name());
                triggerAfterBegin();
            } catch (TransactionException txe) {
                throw new TransactionalExecutor.ExecutionException(tx, txe,
                    TransactionalExecutor.Code.BeginFailure);

            }
            Object rs = null;
            try {

                // Do Your Business
                rs = business.execute();

            } catch (Throwable ex) {

                // 3. any business exception, rollback.
                try {
                    triggerBeforeRollback();
                    tx.rollback();
                    triggerAfterRollback();
                    // 3.1 Successfully rolled back
                    throw new TransactionalExecutor.ExecutionException(tx, TransactionalExecutor.Code.RollbackDone, ex);

                } catch (TransactionException txe) {
                    // 3.2 Failed to rollback
                    throw new TransactionalExecutor.ExecutionException(tx, txe,
                        TransactionalExecutor.Code.RollbackFailure, ex);
                }
            }
            // 4. everything is fine, commit.
            try {
                triggerBeforeCommit();
                tx.commit();
                triggerAfterCommit();
            } catch (TransactionException txe) {
                // 4.1 Failed to commit
                throw new TransactionalExecutor.ExecutionException(tx, txe,
                    TransactionalExecutor.Code.CommitFailure);
            }

            return rs;
        } finally {
            //5. clear
            triggerAfterCompletion();
            cleanUp();
        }
    }
}

其中DefaultGlobalTransaction的begin方法就是开启全局事务



@Override
public void begin(int timeout, String name) throws TransactionException {
    //此处的角色判断有关键的作用
    //表明当前是全局事务的发起者(Launcher)还是参与者(Participant)
    //如果在分布式事务的下游系统方法中也加上GlobalTransactional注解
    //那么它的角色就是Participant,即会忽略后面的begin就退出了
    //而判断是发起者(Launcher)还是参与者(Participant)是根据当前上下文是否已存在XID来判断
    //没有XID的就是Launcher,已经存在XID的就是Participant
    if (role != GlobalTransactionRole.Launcher) {
        check();
        if (LOGGER.isDebugEnabled()) {
            LOGGER.debug("Ignore Begin(): just involved in global transaction [" + xid + "]");
        }
        return;
    }
    if (xid != null) {
        throw new IllegalStateException();
    }
    if (RootContext.getXID() != null) {
        throw new IllegalStateException();
    }
    //具体开启事务的方法,获取TC返回的XID
    xid = transactionManager.begin(null, null, name, timeout);
    status = GlobalStatus.Begin;
    RootContext.bind(xid);
    if (LOGGER.isDebugEnabled()) {
        LOGGER.debug("Begin a NEW global transaction [" + xid + "]");
    }
} 

DefaultTransactionManager负责TM与TC通讯,发送begin、commit、rollback指令


@Override
public String begin(String applicationId, String transactionServiceGroup, String name, int timeout)
    throws TransactionException {
    GlobalBeginRequest request = new GlobalBeginRequest();
    request.setTransactionName(name);
    request.setTimeout(timeout);
    GlobalBeginResponse response = (GlobalBeginResponse)syncCall(request);
    return response.getXid();
}
 


至此拿到TC返回的XID一个全局事务就开启了,日志中也反应了上述流程


2019-04-09 13:46:57.417 DEBUG 31326 --- [nio-8084-exec-1] c.a.f.c.rpc.netty.AbstractRpcRemoting    : offer message: timeout=60000,transactionName=purchase(java.lang.String,java.lang.String,int)
2019-04-09 13:46:57.417 DEBUG 31326 --- [geSend_TMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting    : write message:FescarMergeMessage timeout=60000,transactionName=purchase(java.lang.String,java.lang.String,int), channel:[id: 0xa148545e, L:/127.0.0.1:56120 - R:/127.0.0.1:8091],active?true,writable?true,isopen?true
2019-04-09 13:46:57.418 DEBUG 31326 --- [lector_TMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler    : Send:FescarMergeMessage timeout=60000,transactionName=purchase(java.lang.String,java.lang.String,int)
2019-04-09 13:46:57.421 DEBUG 31326 --- [lector_TMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler    : Receive:MergeResultMessage com.alibaba.fescar.core.protocol.transaction.GlobalBeginResponse@2dc480dc,messageId:1196
2019-04-09 13:46:57.421 DEBUG 31326 --- [nio-8084-exec-1] c.a.fescar.core.context.RootContext      : bind 192.168.224.93:8091:2008502699
2019-04-09 13:46:57.421 DEBUG 31326 --- [nio-8084-exec-1] c.a.f.tm.api.DefaultGlobalTransaction    : Begin a NEW global transaction [192.168.224.93:8091:2008502699] 


全局事务创建后,就开始执行business.execute(),即业务代码storageFeignClient.deduct(commodityCode, orderCount);进入RM处理流程


RM处理流程


@GetMapping(path = "/deduct")
public Boolean deduct(String commodityCode, Integer count){
    storageService.deduct(commodityCode,count);
    return true;
}

@Transactional
public void deduct(String commodityCode, int count){
    Storage storage = storageDAO.findByCommodityCode(commodityCode);
    storage.setCount(storage.getCount()-count);

    storageDAO.save(storage);

    if (count == 5){
        throw new RuntimeException("storage branch exception");
    }
} 


storage的接口和service方法并未出现seata相关的代码和注解,那么它是如何加入到这次全局事务中的呢,答案是ConnectionProxy中,这也是前面说为什么必须要使用DataSourceProxy的原因,通过DataSourceProxy才能在业务代码的事务提交时,seata通过这个切入点,来给TC发送rm的处理结果


由于业务代码本身的事务提交被ConnectionProxy代理,所以在提交本地事务时,实际执行的是ConnectionProxy的commit方法


//部分代码
public class ConnectionProxy extends AbstractConnectionProxy {
    @Override
    public void commit() throws SQLException {
        //如果当前是全局事务,则执行全局事务的提交
        //判断是不是全局事务,就是看当前上下文是否存在XID
        if (context.inGlobalTransaction()) {
            processGlobalTransactionCommit();
        } else if (context.isGlobalLockRequire()) {
            processLocalCommitWithGlobalLocks();
        } else {
            targetConnection.commit();
        }
    }
    
    private void processGlobalTransactionCommit() throws SQLException {
        try {
            //首先是向TC注册RM,拿到TC分配的branchId
            register();
        } catch (TransactionException e) {
            recognizeLockKeyConflictException(e);
        }

        try {
            if (context.hasUndoLog()) {
                //写入undolog
                UndoLogManager.flushUndoLogs(this);
            }
            
            //提交本地事务,可以看到写入undolog和业务数据是在同一个本地事务中
            targetConnection.commit();
        } catch (Throwable ex) {
            //向TC发送rm的事务处理失败的通知
            report(false);
            if (ex instanceof SQLException) {
                throw new SQLException(ex);
            }
        }
        //向TC发送rm的事务处理成功的通知
        report(true);
        context.reset();
    }
    
    //注册RM,构建request通过netty向TC发送指令
    //将返回的branchId存在上下文中
    private void register() throws TransactionException {
        Long branchId = DefaultResourceManager.get().branchRegister(BranchType.AT, getDataSourceProxy().getResourceId(),
                null, context.getXid(), null, context.buildLockKeys());
        context.setBranchId(branchId);
    }
} 


通过日志印证一下上面的流程


2019-04-09 21:57:48.341 DEBUG 38933 --- [nio-8081-exec-1] o.s.c.a.f.web.FescarHandlerInterceptor   : xid in RootContext null xid in RpcContext 192.168.0.2:8091:2008546211
2019-04-09 21:57:48.341 DEBUG 38933 --- [nio-8081-exec-1] c.a.fescar.core.context.RootContext      : bind 192.168.0.2:8091:2008546211
2019-04-09 21:57:48.341 DEBUG 38933 --- [nio-8081-exec-1] o.s.c.a.f.web.FescarHandlerInterceptor   : bind 192.168.0.2:8091:2008546211 to RootContext
2019-04-09 21:57:48.386  INFO 38933 --- [nio-8081-exec-1] o.h.h.i.QueryTranslatorFactoryInitiator  : HHH000397: Using ASTQueryTranslatorFactory
Hibernate: select storage0_.id as id1_0_, storage0_.commodity_code as commodit2_0_, storage0_.count as count3_0_ from storage_tbl storage0_ where storage0_.commodity_code=?
Hibernate: update storage_tbl set count=? where id=?
2019-04-09 21:57:48.673  INFO 38933 --- [nio-8081-exec-1] c.a.fescar.core.rpc.netty.RmRpcClient    : will connect to 192.168.0.2:8091
2019-04-09 21:57:48.673  INFO 38933 --- [nio-8081-exec-1] c.a.fescar.core.rpc.netty.RmRpcClient    : RM will register :jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false
2019-04-09 21:57:48.673  INFO 38933 --- [nio-8081-exec-1] c.a.f.c.rpc.netty.NettyPoolableFactory   : NettyPool create channel to {"address":"192.168.0.2:8091","message":{"applicationId":"storage-service","byteBuffer":{"char":"\u0000","direct":false,"double":0.0,"float":0.0,"int":0,"long":0,"readOnly":false,"short":0},"resourceIds":"jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false","transactionServiceGroup":"hello-service-fescar-service-group","typeCode":103,"version":"0.4.0"},"transactionRole":"RMROLE"}
2019-04-09 21:57:48.677 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler    : Send:RegisterRMRequest{resourceIds='jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false', applicationId='storage-service', transactionServiceGroup='hello-service-fescar-service-group'}
2019-04-09 21:57:48.680 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler    : Receive:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null,messageId:9
2019-04-09 21:57:48.680 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting    : com.alibaba.fescar.core.rpc.netty.RmRpcClient@7d61f5d4 msgId:9, future :com.alibaba.fescar.core.protocol.MessageFuture@186cd3e0, body:version=0.4.1,extraData=null,identified=true,resultCode=null,msg=null
2019-04-09 21:57:48.680  INFO 38933 --- [nio-8081-exec-1] c.a.fescar.core.rpc.netty.RmRpcClient    : register RM success. server version:0.4.1,channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091]
2019-04-09 21:57:48.680  INFO 38933 --- [nio-8081-exec-1] c.a.f.c.rpc.netty.NettyPoolableFactory   : register success, cost 3 ms, version:0.4.1,role:RMROLE,channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091]
2019-04-09 21:57:48.680 DEBUG 38933 --- [nio-8081-exec-1] c.a.f.c.rpc.netty.AbstractRpcRemoting    : offer message: transactionId=2008546211,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,lockKey=storage_tbl:1
2019-04-09 21:57:48.681 DEBUG 38933 --- [geSend_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting    : write message:FescarMergeMessage transactionId=2008546211,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,lockKey=storage_tbl:1, channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091],active?true,writable?true,isopen?true
2019-04-09 21:57:48.681 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler    : Send:FescarMergeMessage transactionId=2008546211,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,lockKey=storage_tbl:1
2019-04-09 21:57:48.687 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler    : Receive:MergeResultMessage BranchRegisterResponse: transactionId=2008546211,branchId=2008546212,result code =Success,getMsg =null,messageId:11
2019-04-09 21:57:48.702 DEBUG 38933 --- [nio-8081-exec-1] c.a.f.rm.datasource.undo.UndoLogManager  : Flushing UNDO LOG: {"branchId":2008546212,"sqlUndoLogs":[{"afterImage":{"rows":[{"fields":[{"keyType":"PrimaryKey","name":"id","type":4,"value":1},{"keyType":"NULL","name":"count","type":4,"value":993}]}],"tableName":"storage_tbl"},"beforeImage":{"rows":[{"fields":[{"keyType":"PrimaryKey","name":"id","type":4,"value":1},{"keyType":"NULL","name":"count","type":4,"value":994}]}],"tableName":"storage_tbl"},"sqlType":"UPDATE","tableName":"storage_tbl"}],"xid":"192.168.0.2:8091:2008546211"}
2019-04-09 21:57:48.755 DEBUG 38933 --- [nio-8081-exec-1] c.a.f.c.rpc.netty.AbstractRpcRemoting    : offer message: transactionId=2008546211,branchId=2008546212,resourceId=null,status=PhaseOne_Done,applicationData=null
2019-04-09 21:57:48.755 DEBUG 38933 --- [geSend_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting    : write message:FescarMergeMessage transactionId=2008546211,branchId=2008546212,resourceId=null,status=PhaseOne_Done,applicationData=null, channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091],active?true,writable?true,isopen?true
2019-04-09 21:57:48.756 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler    : Send:FescarMergeMessage transactionId=2008546211,branchId=2008546212,resourceId=null,status=PhaseOne_Done,applicationData=null
2019-04-09 21:57:48.758 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler    : Receive:MergeResultMessage com.alibaba.fescar.core.protocol.transaction.BranchReportResponse@582a08cf,messageId:13
2019-04-09 21:57:48.799 DEBUG 38933 --- [nio-8081-exec-1] c.a.fescar.core.context.RootContext      : unbind 192.168.0.2:8091:2008546211
2019-04-09 21:57:48.799 DEBUG 38933 --- [nio-8081-exec-1] o.s.c.a.f.web.FescarHandlerInterceptor   : unbind 192.168.0.2:8091:2008546211 from RootContext


  1. 获取business-service传来的XID
  2. 绑定XID到当前上下文中
  3. 执行业务逻辑sql
  4. 向TC创建本次RM的Netty连接
  5. 向TC发送分支事务的相关信息
  6. 获得TC返回的branchId
  7. 记录Undo Log数据
  8. 向TC发送本次事务PhaseOne阶段的处理结果
  9. 从当前上下文中解绑XID


其中第1步和第9步,是在FescarHandlerInterceptor中完成的,该类并不属于seata,而是
spring-cloud-alibaba-fescar中对feign、rest支持的实现。bind和unbind XID到上下文中。到这里RM完成了PhaseOne阶段的工作,接着看PhaseTwo阶段的处理逻辑。


事务提交



由于这次请求是正常流程无异常的,所以分支事务会正常commit。
在storage-service启动时创建了与TC通讯的Netty连接,TC在获取各RM的汇报结果后,就会给各RM发送commit或rollback的指令


2019-04-09 21:57:49.813 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler    : Receive:xid=192.168.0.2:8091:2008546211,branchId=2008546212,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,applicationData=null,messageId:1
2019-04-09 21:57:49.813 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.AbstractRpcRemoting    : com.alibaba.fescar.core.rpc.netty.RmRpcClient@7d61f5d4 msgId:1, body:xid=192.168.0.2:8091:2008546211,branchId=2008546212,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,applicationData=null
2019-04-09 21:57:49.814  INFO 38933 --- [atch_RMROLE_1_8] c.a.f.core.rpc.netty.RmMessageListener   : onMessage:xid=192.168.0.2:8091:2008546211,branchId=2008546212,branchType=AT,resourceId=jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false,applicationData=null
2019-04-09 21:57:49.816  INFO 38933 --- [atch_RMROLE_1_8] com.alibaba.fescar.rm.AbstractRMHandler  : Branch committing: 192.168.0.2:8091:2008546211 2008546212 jdbc:mysql://127.0.0.1:3306/db_storage?useSSL=false null
2019-04-09 21:57:49.816  INFO 38933 --- [atch_RMROLE_1_8] com.alibaba.fescar.rm.AbstractRMHandler  : Branch commit result: PhaseTwo_Committed
2019-04-09 21:57:49.817  INFO 38933 --- [atch_RMROLE_1_8] c.a.fescar.core.rpc.netty.RmRpcClient    : RmRpcClient sendResponse branchStatus=PhaseTwo_Committed,result code =Success,getMsg =null
2019-04-09 21:57:49.817 DEBUG 38933 --- [atch_RMROLE_1_8] c.a.f.c.rpc.netty.AbstractRpcRemoting    : send response:branchStatus=PhaseTwo_Committed,result code =Success,getMsg =null,channel:[id: 0xd40718e3, L:/192.168.0.2:62607 - R:/192.168.0.2:8091]
2019-04-09 21:57:49.817 DEBUG 38933 --- [lector_RMROLE_1] c.a.f.c.rpc.netty.MessageCodecHandler    : Send:branchStatus=PhaseTwo_Committed,result code =Success,getMsg =null

从日志中可以看到


  1. 收到XID=192.168.0.2:8091:2008546211,branchId=2008546212的commit通知
  2. 执行commit动作
  3. 将commit结果发送给TC,branchStatus为PhaseTwo_Committed


具体看下执行commit的过程,在AbstractRMHandler类的doBranchCommit方法之前是接收TC消息包装处理路由的过程


//拿到通知的xid、branchId等关键参数
//然后调用RM的branchCommit
protected void doBranchCommit(BranchCommitRequest request, BranchCommitResponse response) throws TransactionException {
    String xid = request.getXid();
    long branchId = request.getBranchId();
    String resourceId = request.getResourceId();
    String applicationData = request.getApplicationData();
    LOGGER.info("Branch committing: " + xid + " " + branchId + " " + resourceId + " " + applicationData);
    BranchStatus status = getResourceManager().branchCommit(request.getBranchType(), xid, branchId, resourceId, applicationData);
    response.setBranchStatus(status);
    LOGGER.info("Branch commit result: " + status);
} 

最终会将branceCommit的请求调用到AsyncWorker的branchCommit方法。AsyncWorker的处理方式是seata架构的一个关键部分,大部分事务都是会正常提交的,所以在PhaseOne阶段就已经结束了,这样就可以将锁最快的释放。PhaseTwo阶段接收commit的指令后,异步处理即可。将PhaseTwo的时间消耗排除在一次分布式事务之外。


//部分代码
public class AsyncWorker implements ResourceManagerInbound {

    private static final List ASYNC_COMMIT_BUFFER = Collections.synchronizedList(
        new ArrayList());
        
    //将需要提交的XID加入list
    @Override
    public BranchStatus branchCommit(BranchType branchType, String xid, long branchId, String resourceId, String applicationData) throws TransactionException {
        if (ASYNC_COMMIT_BUFFER.size() < ASYNC_COMMIT_BUFFER_LIMIT) {
            ASYNC_COMMIT_BUFFER.add(new Phase2Context(branchType, xid, branchId, resourceId, applicationData));
        } else {
            LOGGER.warn("Async commit buffer is FULL. Rejected branch [" + branchId + "/" + xid + "] will be handled by housekeeping later.");
        }
        return BranchStatus.PhaseTwo_Committed;
    }
    
    //通过一个定时任务消费list中的待提交XID
    public synchronized void init() {
        LOGGER.info("Async Commit Buffer Limit: " + ASYNC_COMMIT_BUFFER_LIMIT);
        timerExecutor = new ScheduledThreadPoolExecutor(1,
            new NamedThreadFactory("AsyncWorker", 1, true));
        timerExecutor.scheduleAtFixedRate(new Runnable() {
            @Override
            public void run() {
                try {
                    doBranchCommits();
                } catch (Throwable e) {
                    LOGGER.info("Failed at async committing ... " + e.getMessage());
                }
            }
        }, 10, 1000 * 1, TimeUnit.MILLISECONDS);
    }
    
    private void doBranchCommits() {
        if (ASYNC_COMMIT_BUFFER.size() == 0) {
            return;
        }
        Map> mappedContexts = new HashMap<>();
        Iterator iterator = ASYNC_COMMIT_BUFFER.iterator();
        
        //一次定时任务取出ASYNC_COMMIT_BUFFER中的所有待办数据
        //以resourceId作为key分组待办数据,resourceId就是一个数据库的连接url
        //在前面的日志中可以看到,目的是为了覆盖应用的多数据源问题
        while (iterator.hasNext()) {
            Phase2Context commitContext = iterator.next();
            List contextsGroupedByResourceId = mappedContexts.get(commitContext.resourceId);
            if (contextsGroupedByResourceId == null) {
                contextsGroupedByResourceId = new ArrayList<>();
                mappedContexts.put(commitContext.resourceId, contextsGroupedByResourceId);
            }
            contextsGroupedByResourceId.add(commitContext);

            iterator.remove();

        }

        for (Map.Entry> entry : mappedContexts.entrySet()) {
            Connection conn = null;
            try {
                try {
                    //根据resourceId获取数据源以及连接
                    DataSourceProxy dataSourceProxy = DataSourceManager.get().get(entry.getKey());
                    conn = dataSourceProxy.getPlainConnection();
                } catch (SQLException sqle) {
                    LOGGER.warn("Failed to get connection for async committing on " + entry.getKey(), sqle);
                    continue;
                }
                List contextsGroupedByResourceId = entry.getValue();
                for (Phase2Context commitContext : contextsGroupedByResourceId) {
                    try {
                        //执行undolog的处理,即删除xid、branchId对应的记录
                        UndoLogManager.deleteUndoLog(commitContext.xid, commitContext.branchId, conn);
                    } catch (Exception ex) {
                        LOGGER.warn(
                            "Failed to delete undo log [" + commitContext.branchId + "/" + commitContext.xid + "]", ex);
                    }
                }

            } finally {
                if (conn != null) {
                    try {
                        conn.close();
                    } catch (SQLException closeEx) {
                        LOGGER.warn("Failed to close JDBC resource while deleting undo_log ", closeEx);
                    }
                }
            }
        }
    }
} 


所以对于commit动作的处理,RM只需删除xid、branchId对应的undolog既可


事务回滚



对于rollback场景的触发有两种情况,分支事务处理异常,即ConnectionProxy中report(false)的情况
TM捕获到下游系统上抛的异常,即发起全局事务标有@GlobalTransactional注解的方法捕获到的异常。在前面TransactionalTemplate类的execute模版方法中,对business.execute()的调用进行了catch,catch后会调用rollback,由TM通知TC对应XID需要回滚事务


public void rollback() throws TransactionException {
   //只有Launcher能发起这个rollback
   if (role == GlobalTransactionRole.Participant) {
       // Participant has no responsibility of committing
       if (LOGGER.isDebugEnabled()) {
           LOGGER.debug("Ignore Rollback(): just involved in global transaction [" + xid + "]");
       }
       return;
   }
   if (xid == null) {
       throw new IllegalStateException();
   }

   status = transactionManager.rollback(xid);
   if (RootContext.getXID() != null) {
       if (xid.equals(RootContext.getXID())) {
           RootContext.unbind();
       }
   }

} 


TC汇总后向参与者发送rollback指令,RM在AbstractRMHandler类的doBranchRollback方法中接收这个rollback的通知


protected void doBranchRollback(BranchRollbackRequest request, BranchRollbackResponse response) throws TransactionException {
    String xid = request.getXid();
    long branchId = request.getBranchId();
    String resourceId = request.getResourceId();
    String applicationData = request.getApplicationData();
    LOGGER.info("Branch rolling back: " + xid + " " + branchId + " " + resourceId);
    BranchStatus status = getResourceManager().branchRollback(request.getBranchType(), xid, branchId, resourceId, applicationData);
    response.setBranchStatus(status);
    LOGGER.info("Branch rollback result: " + status);
}

然后将rollback请求传递到DataSourceManager类的branchRollback方法


public BranchStatus branchRollback(BranchType branchType, String xid, long branchId, String resourceId, String applicationData) throws TransactionException {
    //根据resourceId获取对应的数据源
    DataSourceProxy dataSourceProxy = get(resourceId);
    if (dataSourceProxy == null) {
        throw new ShouldNeverHappenException();
    }
    try {
        UndoLogManager.undo(dataSourceProxy, xid, branchId);
    } catch (TransactionException te) {
        if (te.getCode() == TransactionExceptionCode.BranchRollbackFailed_Unretriable) {
            return BranchStatus.PhaseTwo_RollbackFailed_Unretryable;
        } else {
            return BranchStatus.PhaseTwo_RollbackFailed_Retryable;
        }
    }
    return BranchStatus.PhaseTwo_Rollbacked;

}

最终会执行UndoLogManager类的undo方法,因为是纯jdbc操作代码比较长就不贴出来了,可以通过连接到github查看,说一下undo的具体流程


根据xid和branchId查找PhaseOne阶段提交的undolog
如果找到了就根据undolog中记录的数据生成回放sql并执行,即还原PhaseOne阶段修改的数据
第2步处理完后,删除该条undolog数据
如果第1步没有找到对应的undolog,就插入一条状态为GlobalFinished的undolog.
出现没找到的原因可能是PhaseOne阶段的本地事务异常了,导致没有正常写入。因为xid和branchId是唯一索引,所以第4步的插入,可以防止PhaseOne阶段后续又写入成功,那么PhaseOne阶段就会异常,这样业务数据也是没有提交成功的,数据最终是回滚了的效果


最后希望大家能从文章中得到帮助获得收获,也可以评论出你想看哪方面的技术。文章会持续更新,希望能帮助到大家,哪怕是让你灵光一现。喜欢的朋友可以点点赞和关注,也可以分享出去让更多的人看见,一起努力一起进步!

最近发表
标签列表