go-ethereum中eth-downloader源码学习(一)

本文是downloader的源码分析的第一部分,分析downloader.go也就是主流程

new

downloader主要是进行网络的同步,有两种同步模式,FullSync和FastSync,后者只能在第一次运行时才有效,这里可见ProtocolManager源码。关于两种同步模式,FullSync会通过下载区块头和区块体构建区块链,包括区块的验证,交易的执行,账户的状态改变等,就类似于一个一个插入区块。而FastSync在开始时会直接下载区块头、区块体和收据,并不执行交易,然后在最后若干个区块才会像FullSync一样进行区块同步。

downloader在ProtocolManager创建时被创建,用的是New方法:

func New(mode SyncMode, stateDb ethdb.Database, mux *event.TypeMux, chain BlockChain, lightchain LightChain, dropPeer peerDropFn) *Downloader {
	if lightchain == nil {
		lightchain = chain
	}

	dl := &Downloader{
		mode:           mode,
		stateDB:        stateDb,
		mux:            mux,
		queue:          newQueue(),
		peers:          newPeerSet(),
		rttEstimate:    uint64(rttMaxEstimate),
		rttConfidence:  uint64(1000000),
		blockchain:     chain,
		lightchain:     lightchain,
		dropPeer:       dropPeer,
		headerCh:       make(chan dataPack, 1),
		bodyCh:         make(chan dataPack, 1),
		receiptCh:      make(chan dataPack, 1),
		bodyWakeCh:     make(chan bool, 1),
		receiptWakeCh:  make(chan bool, 1),
		headerProcCh:   make(chan []*types.Header, 1),
		quitCh:         make(chan struct{}),
		stateCh:        make(chan dataPack),
		stateSyncStart: make(chan *stateSync),
		syncStatsState: stateSyncStats{
			processed: rawdb.ReadFastTrieProgress(stateDb),
		},
		trackStateReq: make(chan *stateReq),
	}
	go dl.qosTuner()
	go dl.stateFetcher()
	return dl
}

首先根据传入的参数创建了一个Downloader对象,然后启动了两个goroutine分别执行qosTuner和stateFetcher方法。

Synchronise

通过前面的ProtocolManager的学习,我们了解到总共有两个地方调用了Synchronise,一个是收到一个NewBlockMsg消息时,表示对方有新的区块,如果对方的总难度大于我们,则进行同步调用pm的synchronise方法。另外在启动pm时,在syncer方法中无论是forceSync定时器到时间或者newPeerCh得到赋值(在连接到一个peer时触发)都会调用synchronise。在pm的synchronise中会调用Synchronise方法

Synchronise方法实现如下:

func (d *Downloader) Synchronise(id string, head common.Hash, td *big.Int, mode SyncMode) error {
	err := d.synchronise(id, head, td, mode)
	switch err {
	case nil:
	case errBusy:

	case errTimeout, errBadPeer, errStallingPeer,
		errEmptyHeaderSet, errPeersUnavailable, errTooOld,
		errInvalidAncestor, errInvalidChain:
		log.Warn("Synchronisation failed, dropping peer", "peer", id, "err", err)
		if d.dropPeer == nil {
			log.Warn("Downloader wants to drop peer, but peerdrop-function is not set", "peer", id)
		} else {
			d.dropPeer(id)
		}
	default:
		log.Warn("Synchronisation failed, retrying", "err", err)
	}
	return err
}

直接调用synchronise方法:

func (d *Downloader) synchronise(id string, hash common.Hash, td *big.Int, mode SyncMode) error {
	if d.synchroniseMock != nil {
		return d.synchroniseMock(id, hash)
	}

	if !atomic.CompareAndSwapInt32(&d.synchronising, 0, 1) {
		return errBusy
	}
	defer atomic.StoreInt32(&d.synchronising, 0)

	if atomic.CompareAndSwapInt32(&d.notified, 0, 1) {
		log.Info("Block synchronisation started")
	}

	d.queue.Reset()
	d.peers.Reset()

	for _, ch := range []chan bool{d.bodyWakeCh, d.receiptWakeCh} {
		select {
		case <-ch:
		default:
		}
	}
	for _, ch := range []chan dataPack{d.headerCh, d.bodyCh, d.receiptCh} {
		for empty := false; !empty; {
			select {
			case <-ch:
			default:
				empty = true
			}
		}
	}
	for empty := false; !empty; {
		select {
		case <-d.headerProcCh:
		default:
			empty = true
		}
	}

	d.cancelLock.Lock()
	d.cancelCh = make(chan struct{})
	d.cancelPeer = id
	d.cancelLock.Unlock()

	defer d.Cancel()  channel open

	d.mode = mode

	p := d.peers.Peer(id)
	if p == nil {
		return errUnknownPeer
	}
	return d.syncWithPeer(p, hash, td)
}

这一个方法是在做同步前的准备工作,先确定只有一个实例运行,然后重置queue和peers,表示一次同步准备开始,清空几个channel。之后从peers取出对应id的peer。peer保存着已连接的peer的peerConnection对象,由pm在新连接到来时通过downloader的RegisterPeer方法添加。然后执行syncWithPeer:

func (d *Downloader) syncWithPeer(p *peerConnection, hash common.Hash, td *big.Int) (err error) {
	d.mux.Post(StartEvent{})
	defer func() {
		if err != nil {
			d.mux.Post(FailedEvent{err})
		} else {
			latest := d.lightchain.CurrentHeader()
			d.mux.Post(DoneEvent{latest})
		}
	}()
	if p.version < 62 {
		return errTooOld
	}

	log.Debug("Synchronising with the network", "peer", p.id, "eth", p.version, "head", hash, "td", td, "mode", d.mode)
	defer func(start time.Time) {
		log.Debug("Synchronisation terminated", "elapsed", time.Since(start))
	}(time.Now())


	latest, err := d.fetchHeight(p)
	if err != nil {
		return err
	}
	height := latest.Number.Uint64()

	origin, err := d.findAncestor(p, latest)
	if err != nil {
		return err
	}
	d.syncStatsLock.Lock()
	if d.syncStatsChainHeight <= origin || d.syncStatsChainOrigin > origin {
		d.syncStatsChainOrigin = origin
	}
	d.syncStatsChainHeight = height
	d.syncStatsLock.Unlock()

	pivot := uint64(0)
	if d.mode == FastSync {
		if height <= uint64(fsMinFullBlocks) {
			origin = 0
		} else {
			pivot = height - uint64(fsMinFullBlocks)
			if pivot <= origin {
				origin = pivot - 1
			}
		}
	}
	d.committed = 1
	if d.mode == FastSync && pivot != 0 {
		d.committed = 0
	}

	d.queue.Prepare(origin+1, d.mode)
	if d.syncInitHook != nil {
		d.syncInitHook(origin, height)
	}

	fetchers := []func() error{
		func() error { return d.fetchHeaders(p, origin+1, pivot) }, // Headers are always retrieved
		func() error { return d.fetchBodies(origin + 1) },          // Bodies are retrieved during normal and fast sync
		func() error { return d.fetchReceipts(origin + 1) },        // Receipts are retrieved during fast sync
		func() error { return d.processHeaders(origin+1, pivot, td) },
	}
	if d.mode == FastSync {
		fetchers = append(fetchers, func() error { return d.processFastSyncContent(latest) })
	} else if d.mode == FullSync {
		fetchers = append(fetchers, d.processFullSyncContent)
	}
	return d.spawnSync(fetchers)
}

这个表示从某个peer开始同步,先调用fetchHeight获取区块头:

func (d *Downloader) fetchHeight(p *peerConnection) (*types.Header, error) {
	p.log.Debug("Retrieving remote chain height")

	head, _ := p.peer.Head()
	go p.peer.RequestHeadersByHash(head, 1, 0, false)

	ttl := d.requestTTL()
	timeout := time.After(ttl)
	for {
		select {
		case <-d.cancelCh:
			return nil, errCancelBlockFetch

		case packet := <-d.headerCh:
			if packet.PeerId() != p.id {
				log.Debug("Received headers from incorrect peer", "peer", packet.PeerId())
				break
			}
			headers := packet.(*headerPack).headers
			if len(headers) != 1 {
				p.log.Debug("Multiple headers for single request", "headers", len(headers))
				return nil, errBadPeer
			}
			head := headers[0]
			p.log.Debug("Remote head header identified", "number", head.Number, "hash", head.Hash())
			return head, nil

		case <-timeout:
			p.log.Debug("Waiting for head header timed out", "elapsed", ttl)
			return nil, errTimeout

		case <-d.bodyCh:
		case <-d.receiptCh:
		}
	}
}

这里主要就是启动一个goroutine利用RequestHeadersByHash方法去请求区块头,这里的peer是pm的newPeer方法创建的,RequestHeadersByHash就是发送一个GetBlockHeadersMsg消息。由于是在单独线程执行的RequestHeadersByHash,在外面设置了一个定时器,超时的话报错。

另外在回顾pm,在对方收到GetBlockHeadersMsg消息后,会回复BlockHeadersMsg消息,在fetcher过滤完后调用DeliverHeaders方法:

func (d *Downloader) DeliverHeaders(id string, headers []*types.Header) (err error) {
	return d.deliver(id, d.headerCh, &headerPack{id, headers}, headerInMeter, headerDropMeter)
}

func (d *Downloader) deliver(id string, destCh chan dataPack, packet dataPack, inMeter, dropMeter metrics.Meter) (err error) {
	inMeter.Mark(int64(packet.Items()))
	defer func() {
		if err != nil {
			dropMeter.Mark(int64(packet.Items()))
		}
	}()

	d.cancelLock.RLock()
	cancel := d.cancelCh
	d.cancelLock.RUnlock()
	if cancel == nil {
		return errNoSyncActive
	}
	select {
	case destCh <- packet:
		return nil
	case <-cancel:
		return errNoSyncActive
	}
}

这里向destCh也就是headerCh赋值,也就触发fetchHeight中的对应逻辑,从而返回获得的区块头,这个区块头就是该peer的最新的区块。回到syncWithPeer中,又用findAncestor方法去寻找大家的共同祖先以便开始同步。

再往下设置了pivot,这是针对快速同步而言的,在最后64个块,即使是快速同步模式也要使用fullmode模式,这里如果对方高度小于64,则从0开始同步而pivot也为0,否则如果刚才计算的公共祖先在最后64个以内,则从倒数第64个开始同步而且pivot也设为倒数第64个区块高度。另外还设置committed的值,为1时表示快速模式结束。然后调用了queue的prepare方法准备进行同步,prepare主要是配置了偏移量和同步模式以便于调度时寻找正确区块。

接着设置一组fetcher方法分别用于fetch区块头、区块体和收据以及处理区块头,另外根据模式的不同添加不通的方法,最后调用spawnSync

func (d *Downloader) spawnSync(fetchers []func() error) error {
	errc := make(chan error, len(fetchers))
	d.cancelWg.Add(len(fetchers))
	for _, fn := range fetchers {
		fn := fn
		go func() { defer d.cancelWg.Done(); errc <- fn() }()
	}

	for i := 0; i < len(fetchers); i++ {
		if i == len(fetchers)-1 {

			d.queue.Close()
		}
		if err = <-errc; err != nil {
			break
		}
	}
	d.queue.Close()
	d.Cancel()
	return err
}

这里为fetchers中每个方法都启动一个goroutine去执行,等待结束或出错。我们看一下fetchers中的方法

fetchHeaders

func (d *Downloader) fetchHeaders(p *peerConnection, from uint64, pivot uint64) error {
	p.log.Debug("Directing header downloads", "origin", from)
	defer p.log.Debug("Header download terminated")

	skeleton := true            // Skeleton assembly phase or finishing up
	request := time.Now()       // time of the last skeleton fetch request
	timeout := time.NewTimer(0) // timer to dump a non-responsive active peer
	<-timeout.C                 // timeout channel should be initially empty
	defer timeout.Stop()

	var ttl time.Duration
	getHeaders := func(from uint64) {
		request = time.Now()

		ttl = d.requestTTL()
		timeout.Reset(ttl)

		if skeleton {
			p.log.Trace("Fetching skeleton headers", "count", MaxHeaderFetch, "from", from)
			go p.peer.RequestHeadersByNumber(from+uint64(MaxHeaderFetch)-1, MaxSkeletonSize, MaxHeaderFetch-1, false)
		} else {
			p.log.Trace("Fetching full headers", "count", MaxHeaderFetch, "from", from)
			go p.peer.RequestHeadersByNumber(from, MaxHeaderFetch, 0, false)
		}
	}
	getHeaders(from)

	for {
		select {
		case <-d.cancelCh:
			return errCancelHeaderFetch

		case packet := <-d.headerCh:
			if packet.PeerId() != p.id {
				log.Debug("Received skeleton from incorrect peer", "peer", packet.PeerId())
				break
			}
			headerReqTimer.UpdateSince(request)
			timeout.Stop()

			if packet.Items() == 0 && skeleton {
				skeleton = false
				getHeaders(from)
				continue
			}

			if packet.Items() == 0 {
				if atomic.LoadInt32(&d.committed) == 0 && pivot <= from {
					p.log.Debug("No headers, waiting for pivot commit")
					select {
					case <-time.After(fsHeaderContCheck):
						getHeaders(from)
						continue
					case <-d.cancelCh:
						return errCancelHeaderFetch
					}
				}
				p.log.Debug("No more headers available")
				select {
				case d.headerProcCh <- nil:
					return nil
				case <-d.cancelCh:
					return errCancelHeaderFetch
				}
			}
			headers := packet.(*headerPack).headers

			if skeleton {
				filled, proced, err := d.fillHeaderSkeleton(from, headers)
				if err != nil {
					p.log.Debug("Skeleton chain invalid", "err", err)
					return errInvalidChain
				}
				headers = filled[proced:]
				from += uint64(proced)
			} else {
				if n := len(headers); n > 0 {
					head := uint64(0)
					if d.mode == LightSync {
						head = d.lightchain.CurrentHeader().Number.Uint64()
					} else {
						head = d.blockchain.CurrentFastBlock().NumberU64()
						if full := d.blockchain.CurrentBlock().NumberU64(); head < full {
							head = full
						}
					}
					if head+uint64(reorgProtThreshold) < headers[n-1].Number.Uint64() {
						delay := reorgProtHeaderDelay
						if delay > n {
							delay = n
						}
						headers = headers[:n-delay]
					}
				}
			}
			if len(headers) > 0 {
				p.log.Trace("Scheduling new headers", "count", len(headers), "from", from)
				select {
				case d.headerProcCh <- headers:
				case <-d.cancelCh:
					return errCancelHeaderFetch
				}
				from += uint64(len(headers))
				getHeaders(from)
			} else {
				p.log.Trace("All headers delayed, waiting")
				select {
				case <-time.After(fsHeaderContCheck):
					getHeaders(from)
					continue
				case <-d.cancelCh:
					return errCancelHeaderFetch
				}
			}

		case <-timeout.C:
			if d.dropPeer == nil {
				p.log.Warn("Downloader wants to drop peer, but peerdrop-function is not set", "peer", p.id)
				break
			}
			p.log.Debug("Header request timed out", "elapsed", ttl)
			headerTimeoutMeter.Mark(1)
			d.dropPeer(p.id)

			for _, ch := range []chan bool{d.bodyWakeCh, d.receiptWakeCh} {
				select {
				case ch <- false:
				case <-d.cancelCh:
				}
			}
			select {
			case d.headerProcCh <- nil:
			case <-d.cancelCh:
			}
			return errBadPeer
		}
	}
}

这个方法比较长,基本上分了两部分首先定义了一个getHeaders方法,用于从对应的peer获取一定数量的head,指定起始位置之后,根据是否是骨架模式来决定下载哪些head。如果是骨架模式,则从from+191开始获取128个head,每两个head之间间隔191个区块,这样构成一个断续的框架。如果不是骨架模式,则从from开始连续请求192个区块。请求方法都是peer的RequestHeadersByNumber方法,发送了GetBlockHeadersMsg消息,这个我们前面分析过,会触发接下来的for-select结构的headerCh逻辑。

在对应case中,首先验证了是否来自对应的peer,判断返回的head数,如果为0且是骨架模式,则禁用骨架模式,再次调用getHeaders请求head。如果再一次获取的head数量还是0,且仍处于快速模式并且没有进入最后64个区块,则等待3秒后继续刚才的连续请求知道接收到取消信号;如果获取的head数量还是0但不处于快速模式,表示已经同步完毕,给headerProcCh赋值并退出,headerProcCh影响processHeaders里的逻辑。

对于收到的区块头数量不为0的话,根据不同模式进行不同处理。

如果是骨架模式,则调用fillHeaderSkeleton去填充那个框架,这个方法里先调用ScheduleSkeleton进行调用,然后调用了fetchParts方法,这个稍后再讲。

如果不是骨架模式,表示我们从from开始连续请求了192个head,这里根据具体情况延迟最后几个区块头。总之这个if-else结构确定了下一步fetch的head集合。然后将该集合内的head赋值给headerProcCh触发processHeaders中对应逻辑,并更新from的值再次调用getHeaders去请求下一批head,直到没有head可取,等待3秒再次尝试。

fetchBodies

func (d *Downloader) fetchBodies(from uint64) error {
	log.Debug("Downloading block bodies", "origin", from)

	var (
		deliver = func(packet dataPack) (int, error) {
			pack := packet.(*bodyPack)
			return d.queue.DeliverBodies(pack.peerID, pack.transactions, pack.uncles)
		}
		expire   = func() map[string]int { return d.queue.ExpireBodies(d.requestTTL()) }
		fetch    = func(p *peerConnection, req *fetchRequest) error { return p.FetchBodies(req) }
		capacity = func(p *peerConnection) int { return p.BlockCapacity(d.requestRTT()) }
		setIdle  = func(p *peerConnection, accepted int) { p.SetBodiesIdle(accepted) }
	)
	err := d.fetchParts(errCancelBodyFetch, d.bodyCh, deliver, d.bodyWakeCh, expire,
		d.queue.PendingBlocks, d.queue.InFlightBlocks, d.queue.ShouldThrottleBlocks, d.queue.ReserveBodies,
		d.bodyFetchHook, fetch, d.queue.CancelBodies, capacity, d.peers.BodyIdlePeers, setIdle, "bodies")

	log.Debug("Block body download terminated", "err", err)
	return err
}

相比fetchHeaders,fetchBodies就简单很多,就是直接调用fetchParts,关于fetchParts稍后再分析

fetchReceipts

func (d *Downloader) fetchReceipts(from uint64) error {
	log.Debug("Downloading transaction receipts", "origin", from)

	var (
		deliver = func(packet dataPack) (int, error) {
			pack := packet.(*receiptPack)
			return d.queue.DeliverReceipts(pack.peerID, pack.receipts)
		}
		expire   = func() map[string]int { return d.queue.ExpireReceipts(d.requestTTL()) }
		fetch    = func(p *peerConnection, req *fetchRequest) error { return p.FetchReceipts(req) }
		capacity = func(p *peerConnection) int { return p.ReceiptCapacity(d.requestRTT()) }
		setIdle  = func(p *peerConnection, accepted int) { p.SetReceiptsIdle(accepted) }
	)
	err := d.fetchParts(errCancelReceiptFetch, d.receiptCh, deliver, d.receiptWakeCh, expire,
		d.queue.PendingReceipts, d.queue.InFlightReceipts, d.queue.ShouldThrottleReceipts, d.queue.ReserveReceipts,
		d.receiptFetchHook, fetch, d.queue.CancelReceipts, capacity, d.peers.ReceiptIdlePeers, setIdle, "receipts")

	log.Debug("Transaction receipt download terminated", "err", err)
	return err
}

和fetchBodies一样,都是直接调用fetchParts

processHeaders

func (d *Downloader) processHeaders(origin uint64, pivot uint64, td *big.Int) error {
	rollback := []*types.Header{}
	defer func() {
		if len(rollback) > 0 {
			hashes := make([]common.Hash, len(rollback))
			for i, header := range rollback {
				hashes[i] = header.Hash()
			}
			lastHeader, lastFastBlock, lastBlock := d.lightchain.CurrentHeader().Number, common.Big0, common.Big0
			if d.mode != LightSync {
				lastFastBlock = d.blockchain.CurrentFastBlock().Number()
				lastBlock = d.blockchain.CurrentBlock().Number()
			}
			d.lightchain.Rollback(hashes)
			curFastBlock, curBlock := common.Big0, common.Big0
			if d.mode != LightSync {
				curFastBlock = d.blockchain.CurrentFastBlock().Number()
				curBlock = d.blockchain.CurrentBlock().Number()
			}
			log.Warn("Rolled back headers", "count", len(hashes),
				"header", fmt.Sprintf("%d->%d", lastHeader, d.lightchain.CurrentHeader().Number),
				"fast", fmt.Sprintf("%d->%d", lastFastBlock, curFastBlock),
				"block", fmt.Sprintf("%d->%d", lastBlock, curBlock))
		}
	}()

	gotHeaders := false

	for {
		select {
		case <-d.cancelCh:
			return errCancelHeaderProcessing

		case headers := <-d.headerProcCh:
			if len(headers) == 0 {
				for _, ch := range []chan bool{d.bodyWakeCh, d.receiptWakeCh} {
					select {
					case ch <- false:
					case <-d.cancelCh:
					}
				}
				if d.mode != LightSync {
					head := d.blockchain.CurrentBlock()
					if !gotHeaders && td.Cmp(d.blockchain.GetTd(head.Hash(), head.NumberU64())) > 0 {
						return errStallingPeer
					}
				}
				if d.mode == FastSync || d.mode == LightSync {
					head := d.lightchain.CurrentHeader()
					if td.Cmp(d.lightchain.GetTd(head.Hash(), head.Number.Uint64())) > 0 {
						return errStallingPeer
					}
				}
				rollback = nil
				return nil
			}
			gotHeaders = true
			for len(headers) > 0 {
				select {
				case <-d.cancelCh:
					return errCancelHeaderProcessing
				default:
				}
				limit := maxHeadersProcess
				if limit > len(headers) {
					limit = len(headers)
				}
				chunk := headers[:limit]

				if d.mode == FastSync || d.mode == LightSync {
					unknown := make([]*types.Header, 0, len(headers))
					for _, header := range chunk {
						if !d.lightchain.HasHeader(header.Hash(), header.Number.Uint64()) {
							unknown = append(unknown, header)
						}
					}
					frequency := fsHeaderCheckFrequency
					if chunk[len(chunk)-1].Number.Uint64()+uint64(fsHeaderForceVerify) > pivot {
						frequency = 1
					}
					if n, err := d.lightchain.InsertHeaderChain(chunk, frequency); err != nil {
						if n > 0 {
							rollback = append(rollback, chunk[:n]...)
						}
						log.Debug("Invalid header encountered", "number", chunk[n].Number, "hash", chunk[n].Hash(), "err", err)
						return errInvalidChain
					}
					rollback = append(rollback, unknown...)
					if len(rollback) > fsHeaderSafetyNet {
						rollback = append(rollback[:0], rollback[len(rollback)-fsHeaderSafetyNet:]...)
					}
				}
				if d.mode == FullSync || d.mode == FastSync {
					for d.queue.PendingBlocks() >= maxQueuedHeaders || d.queue.PendingReceipts() >= maxQueuedHeaders {
						select {
						case <-d.cancelCh:
							return errCancelHeaderProcessing
						case <-time.After(time.Second):
						}
					}
					inserts := d.queue.Schedule(chunk, origin)
					if len(inserts) != len(chunk) {
						log.Debug("Stale headers")
						return errBadPeer
					}
				}
				headers = headers[limit:]
				origin += uint64(limit)
			}
			d.syncStatsLock.Lock()
			if d.syncStatsChainHeight < origin {
				d.syncStatsChainHeight = origin - 1
			}
			d.syncStatsLock.Unlock()

			for _, ch := range []chan bool{d.bodyWakeCh, d.receiptWakeCh} {
				select {
				case ch <- true:
				default:
				}
			}
		}
	}
}

这是固定的一组fetchers中的最后一个启动的方法,用于处理收到了head。

首先定义了一个defer执行的方法用于在出错时回滚。接着还是一个无限循环加select的结构,还记得前面fetchHeaders方法最后将一组head赋值给headerProcCh了么,就在这里触发了响应逻辑。

首先处理的传来的head数量为0的情况,为0的情况是在fetchHeaders中fetch区块头结束后给headerProcCh传了nil导致的,此时给bodyWakeCh和receiptWakeCh都传了false。接着判断了如果对方此时总难度还比我们大,但是我们却取不到什么东西则返回一个错误

对于传来的head数量不为0时,先判断数量是否大于2048,否则进行截断,先处理前2048个。接着对于同步模式是FastSync或者LightSync的情况,先遍历了传来的head在本地是否已有,没有的存入unknown中。接着定义了frequency,原始值为100,但是如果是最后64个则为1,这个值表示在插入时每隔多少个区块验证一次,最后64个区块需要全部验证,其余的每100个验证一次。然后调用InsertHeaderChain进行插入,这是core/blockchain.go中的方法,返回的n表示实在第几个区块出错的,最后如果有错则将chunk[:n]添加到rollback便于回滚,目的是一旦出错则回滚所有内容。如果没错也将刚才unknown的内容添加到rollback,如果回滚的总量大于2048,只取最后2048个区块。

再往下如果是FullSync或FastSync模式,先判断queue的PendingBlocks或PendingReceipts数量是否大于规定,是的话先等待1秒,反复循环直到其数量满足要求。然后调用了Schedule申请调度,接下来修改headers和origin循环处理剩余的head。

最后给通道d.bodyWakeCh, d.receiptWakeCh发送消息,触发对应逻辑。

fetchParts

// The instrumentation parameters:
//  - errCancel:   fetch操作被取消时返回的错误类型
//  - deliveryCh:  检索下载数据包的通道
//  - deliver:     将数据包传递到特定队列的回调函数
//  - wakeCh:      当心的任务到来时唤醒fetch
//  - expire:      因为超时而终止的回调
//  - pending:     还需要下载的任务数量
//  - inFlight:    正在处理的请求数量
//  - throttle:    用于检测某个队列是否已满
//  - reserve:     用来将新的下载任务保留给特定的peer
//  - fetchHook:   用于通知正在启动新的任务
//  - fetch:       发送网络请求
//  - cancel:      取消正在请求的下载
//  - capacity:    获取网络带宽
//  - idle:        检测peer是否空闲
//  - setIdle:     设置peer为空闲
//  - kind:        正在下载的类型
func (d *Downloader) fetchParts(errCancel error, deliveryCh chan dataPack, deliver func(dataPack) (int, error), wakeCh chan bool,
	expire func() map[string]int, pending func() int, inFlight func() bool, throttle func() bool, reserve func(*peerConnection, int) (*fetchRequest, bool, error),
	fetchHook func([]*types.Header), fetch func(*peerConnection, *fetchRequest) error, cancel func(*fetchRequest), capacity func(*peerConnection) int,
	idle func() ([]*peerConnection, int), setIdle func(*peerConnection, int), kind string) error {

	ticker := time.NewTicker(100 * time.Millisecond)
	defer ticker.Stop()

	update := make(chan struct{}, 1)

	finished := false
	for {
		select {
		case <-d.cancelCh:
			return errCancel

		case packet := <-deliveryCh:
			if peer := d.peers.Peer(packet.PeerId()); peer != nil {
				accepted, err := deliver(packet)
				if err == errInvalidChain {
					return err
				}
				if err != errStaleDelivery {
					setIdle(peer, accepted)
				}
				switch {
				case err == nil && packet.Items() == 0:
					peer.log.Trace("Requested data not delivered", "type", kind)
				case err == nil:
					peer.log.Trace("Delivered new batch of data", "type", kind, "count", packet.Stats())
				default:
					peer.log.Trace("Failed to deliver retrieved data", "type", kind, "err", err)
				}
			}

			select {
			case update <- struct{}{}:
			default:
			}

		case cont := <-wakeCh:
			if !cont {
				finished = true
			}
			select {
			case update <- struct{}{}:
			default:
			}

		case <-ticker.C:
			select {
			case update <- struct{}{}:
			default:
			}

		case <-update:

			if d.peers.Len() == 0 {
				return errNoPeers
			}

			for pid, fails := range expire() {
				if peer := d.peers.Peer(pid); peer != nil {
					if fails > 2 {
						peer.log.Trace("Data delivery timed out", "type", kind)
						setIdle(peer, 0)
					} else {
						peer.log.Debug("Stalling delivery, dropping", "type", kind)
						if d.dropPeer == nil {
							peer.log.Warn("Downloader wants to drop peer, but peerdrop-function is not set", "peer", pid)
						} else {
							d.dropPeer(pid)
						}
					}
				}
			}

			if pending() == 0 {
				if !inFlight() && finished {
					log.Debug("Data fetching completed", "type", kind)
					return nil
				}
				break
			}

			progressed, throttled, running := false, false, inFlight()
			idles, total := idle()

			for _, peer := range idles {

				if throttle() {
					throttled = true
					break
				}

				if pending() == 0 {
					break
				}

				request, progress, err := reserve(peer, capacity(peer))
				if err != nil {
					return err
				}
				if progress {
					progressed = true
				}
				if request == nil {
					continue
				}
				if request.From > 0 {
					peer.log.Trace("Requesting new batch of data", "type", kind, "from", request.From)
				} else {
					peer.log.Trace("Requesting new batch of data", "type", kind, "count", len(request.Headers), "from", request.Headers[0].Number)
				}
				if fetchHook != nil {
					fetchHook(request.Headers)
				}
				if err := fetch(peer, request); err != nil {
					panic(fmt.Sprintf("%v: %s fetch assignment failed", peer, kind))
				}
				running = true
			}

			if !progressed && !throttled && !running && len(idles) == total && pending() > 0 {
				return errPeersUnavailable
			}
		}
	}
}

这个方法在前面多次出现,这里详细分析一下。首先这个方法的参数非常多,大致功能在方法前的注释部分已经写了,可以简单参考一下。

首先设置了一个ticker,每100毫秒触发一次,触发时向update写入内容,触发相应逻辑:首先判断peer数量是否为0,然后调用expire方法获取那些因为超时而终止的请求,expire根据传入的参数有不同的方法,但都属于queue的ExpireXxx其中之一,最后返回的是一个map,保存着每个peer超时的请求的head数量。回到fetchParts中,遍历这个map,如果对应peer失效的head数大于2,者将其设为idle状态。否则调用pm的removePeer方法将该peer移除。

处理完超时的请求后,调用了pending方法,这个方法主要就是返回queue中headerTaskQueue、blockTaskQueue和receiptTaskQueue这几个队列其中某个的大小,这些队列存储着调度任务。如果对应队列为空的话,调用了inFlight方法,对应的是queue中headerPendPool、blockPendPool和receiptPendPool这几个请求等待池中对应的那个池是否为空,如果不为空但是finished为true表示完成则退出,否则表示暂时没有任务,进行下次循环等待逻辑触发。

接下来设置了几个标志位,然后通过idle方法获取了对应状态空闲的peer。之后遍历这些空闲的peer。下面调用了throttle方法,分别对应queue中的ShouldThrottleBlocks和ShouldThrottleReceipts,在fillHeaderSkeleton中调用fetchParts该方法时直接返回false。ShouldThrottleBlocks和ShouldThrottleReceipts分别返回对应下载任务是否需要限流。如果需要则停止本次操作,等待下一次。否则如果没有等待的任务则也停止本次操作。

再往下调用了reserve,对应的是queue的ReserveHeaders、ReserveBodies和ReserveReceipts方法。传入的参数是相应的peer的HeaderCapacity、BlockCapacity或ReceiptCapacity返回的结果。关于ReserveXxx后面会详细介绍,这里简单说一下作用。对于ReserveHeaders,是构造一个请求,请求中包含了起始位置,让后将请求放入headerPendPool并返回这个请求,headerPendPool存储了等待下载的请求。对于ReserveBodies和ReserveReceipts则是将对应的taskQueue的head进行分类,并构造请求结果放入resultCache中,之后将要跳过的head重新放入taskQueue,最后构建一个请求包含剩余的head,并将该请求放入对应的pendPool,最后返回该请求以及处理状态progress。其中的taskQueue是在调用Schedule时放入的,这里取出准备进行请求。

回到fetchParts中,经过reserve方法后,返回了构造的请求。进行了简单的判断后,调用了fetch方法去处理请求。分别对应peer的FetchHeaders、FetchBodies和FetchReceipts。这几个方法就是根据请求的内容实际调用peer的Request去发送请求。

这样一个idle状态的peer处理完毕,后面会遍历所有idle状态的peer执行相同逻辑。

除了update对应的逻辑,还有deliveryCh对应的逻辑。回顾pm的handleMsg逻辑,在收到BlockHeadersMsg、BlockBodiesMsg和ReceiptsMsg消息后,这几个消息对应前面FetchHeaders、FetchBodies和FetchReceipts发送的请求消息的回应,也就是在收到响应后,会调用downloader的相应DeliverXxx方法,之后调用deliver方法,这个方法的主要作用是向headerCh、bodyCh和receiptCh某一个发送收到的包,headerCh、bodyCh和receiptCh这三个channel分别在调用fetchParts时被传入deliveryCh参数,所以收到响应后触发了deliveryCh对应的逻辑。

在这里面,首先调用了deliver方法,这个分别对应queue的DeliverHeaders、DeliverBodies和DeliverReceipts方法。这几个方法后面会详细介绍,这里简单说一下作用。对于DeliverHeaders方法主要是检查收到的heads是否正确,不正确的话将其重新放回headerTaskQueue等待后续处理。正确的话将准备好的head传入headerProcCh触发processHeaders中逻辑并返回收到的head数量。对于DeliverBodies和DeliverReceipts,主要是对收到的数据进行检查,成功的补全fetchResult,不成功放入taskQueue重新请求。最后返回成功接收的数量,然后将对应peer设为空闲,最后打印log。

processFastSyncContent

func (d *Downloader) processFastSyncContent(latest *types.Header) error {
	stateSync := d.syncState(latest.Root)
	defer stateSync.Cancel()
	go func() {
		if err := stateSync.Wait(); err != nil && err != errCancelStateFetch {
			d.queue.Close() // wake up Results
		}
	}()
	pivot := uint64(0)
	if height := latest.Number.Uint64(); height > uint64(fsMinFullBlocks) {
		pivot = height - uint64(fsMinFullBlocks)
	}
	var (
		oldPivot *fetchResult   // Locked in pivot block, might change eventually
		oldTail  []*fetchResult // Downloaded content after the pivot
	)
	for {
		results := d.queue.Results(oldPivot == nil) // Block if we're not monitoring pivot staleness
		if len(results) == 0 {
			if oldPivot == nil {
				return stateSync.Cancel()
			}
			select {
			case <-d.cancelCh:
				return stateSync.Cancel()
			default:
			}
		}
		if d.chainInsertHook != nil {
			d.chainInsertHook(results)
		}
		if oldPivot != nil {
			results = append(append([]*fetchResult{oldPivot}, oldTail...), results...)
		}
		if atomic.LoadInt32(&d.committed) == 0 {
			latest = results[len(results)-1].Header
			if height := latest.Number.Uint64(); height > pivot+2*uint64(fsMinFullBlocks) {
				log.Warn("Pivot became stale, moving", "old", pivot, "new", height-uint64(fsMinFullBlocks))
				pivot = height - uint64(fsMinFullBlocks)
			}
		}
		P, beforeP, afterP := splitAroundPivot(pivot, results)
		if err := d.commitFastSyncData(beforeP, stateSync); err != nil {
			return err
		}
		if P != nil {
			if oldPivot != P {
				stateSync.Cancel()

				stateSync = d.syncState(P.Header.Root)
				defer stateSync.Cancel()
				go func() {
					if err := stateSync.Wait(); err != nil && err != errCancelStateFetch {
						d.queue.Close() // wake up Results
					}
				}()
				oldPivot = P
			}
			select {
			case <-stateSync.done:
				if stateSync.err != nil {
					return stateSync.err
				}
				if err := d.commitPivotBlock(P); err != nil {
					return err
				}
				oldPivot = nil

			case <-time.After(time.Second):
				oldTail = afterP
				continue
			}
		}
		if err := d.importBlockResults(afterP); err != nil {
			return err
		}
	}
}

这是fast模式特有的方法,也是整个快速同步过程中的最后一步。首先调用syncState,这个方法后面会分析,syncState启动后开始同步状态树。与此同时processFastSyncContent中计算了pivot,这个值和前面syncWithPeer中的pivot类似。随后启动了一个死循环,循环中先调用Results方法,Results主要取出已经处理完成的块。然后将这些块根据pivot的值分为3部分(使用splitAroundPivot):pivot之前的、pivot之后的和pivot位置上的。然后将之前的直接插入区块链(使用commitFastSyncData方法),对于之后的虽然也要插入区块链,但是不直接插入收据数据。

processFullSyncContent

func (d *Downloader) processFullSyncContent() error {
	for {
		results := d.queue.Results(true)
		if len(results) == 0 {
			return nil
		}
		if d.chainInsertHook != nil {
			d.chainInsertHook(results)
		}
		if err := d.importBlockResults(results); err != nil {
			return err
		}
	}
}

这个是full模式下的最后一步,首先还是调用Results获取已经处理完毕的块,然后调用importBlockResults方法插入区块链,和processFastSyncContent中处理pivot点之后的区块一样,不直接插入下载的收据,详细插入过程在BlockChain源码部分分析。

processFullSyncContent与processFastSyncContent也就集中体现了两种同步方式的区别

DeliverXxx

是一组方法,一共有四个DeliverHeaders、DeliverBodies、DeliverReceipts和DeliverNodeData,前三个在queue中有同名方法,但作用不同。这几个方法主要用在pm中,用于在收到对应响应时执行对应逻辑

func (d *Downloader) DeliverHeaders(id string, headers []*types.Header) (err error) {
	return d.deliver(id, d.headerCh, &headerPack{id, headers}, headerInMeter, headerDropMeter)
}

func (d *Downloader) DeliverBodies(id string, transactions [][]*types.Transaction, uncles [][]*types.Header) (err error) {
	return d.deliver(id, d.bodyCh, &bodyPack{id, transactions, uncles}, bodyInMeter, bodyDropMeter)
}

func (d *Downloader) DeliverReceipts(id string, receipts [][]*types.Receipt) (err error) {
	return d.deliver(id, d.receiptCh, &receiptPack{id, receipts}, receiptInMeter, receiptDropMeter)
}

func (d *Downloader) DeliverNodeData(id string, data [][]byte) (err error) {
	return d.deliver(id, d.stateCh, &statePack{id, data}, stateInMeter, stateDropMeter)
}

都只是调用了deliver方法,区别在于传入的数据和触发异步逻辑的channel

func (d *Downloader) deliver(id string, destCh chan dataPack, packet dataPack, inMeter, dropMeter metrics.Meter) (err error) {
	inMeter.Mark(int64(packet.Items()))
	defer func() {
		if err != nil {
			dropMeter.Mark(int64(packet.Items()))
		}
	}()

	d.cancelLock.RLock()
	cancel := d.cancelCh
	d.cancelLock.RUnlock()
	if cancel == nil {
		return errNoSyncActive
	}
	select {
	case destCh <- packet:
		return nil
	case <-cancel:
		return errNoSyncActive
	}
}

没有什么实际动作,关键是将收到的数据传给destCh触发后续逻辑,对于前三个方法都是触发fetchParts中deliveryCh的逻辑,第四个方法触发了statesync中runStateSyn方法的逻辑。

题图来自unsplash:https://unsplash.com/photos/PHdLDymaV90