C#多线程编程中的锁系统(四):自旋锁
目录
一:基础
二:自旋锁示例
三:spinlock
四:继续spinlock
五:总结
一:基础
内核锁:基于内核对象构造的锁机制,就是通常说的内核构造模式。
优点:cpu利用最大化。它发现资源被锁住,请求就排队等候。线程切换到别处干活,直到接受到可用信号,线程再切回来继续处理请求。
缺点:托管代码->用户模式代码->内核代码损耗、线程上下文切换损耗。
在锁的时间比较短时,系统频繁忙于休眠、切换,是个很大的性能损耗。
自旋锁:原子操作+自循环。通常说的用户构造模式。 线程不休眠,一直循环尝试对资源访问,直到可用。
优点:完美解决内核锁的缺点。
缺点:长时间一直循环会导致cpu的白白浪费,高并发竞争下、cpu的消耗特别严重。
混合锁:内核锁+自旋锁。 混合锁是先自旋锁一段时间或自旋多少次,再转成内核锁。
优点:内核锁和自旋锁的折中方案,利用前二者优点,避免出现极端情况(自旋时间过长,内核锁时间过短)。
缺点: 自旋多少时间、自旋多少次,这些策略很难把控。
ps:操作系统或net框架,这块算法策略做的已经非常优了,有些api函数也提供了时间及次数可配置项,让开发者根据需求自行判断。
二:自旋锁示例
来看下我们自己简单实现的自旋锁:
int signal = 0;
var li = new list<int>();
parallel.for(0, 1000 * 10000, r =>
{
while (interlocked.exchange(ref signal, 1) != 0)//加自旋锁
{
//黑魔法
}
li.add(r);
interlocked.exchange(ref signal, 0); //释放锁
});
console.writeline(li.count);
//输出:10000000
上面就是自旋锁:interlocked.exchange+while
1:定义signal 0可用,1不可用。
2:parallel模拟并发竞争,原子更改signal状态。 后续线程自旋访问signal,是否可用。
3:a线程使用完后,更改signal为0。 剩余线程竞争访问资源,b线程胜利后,更改signal为1,失败线程继续自旋,直到可用。
三:spinlock
spinlock是net4.0后系统帮我们实现的自旋锁,内部做了优化。
简单看下实例:
var li = new list<int>();
var sl = new spinlock();
parallel.for(0, 1000 * 10000, r =>
{
bool gotlock = false; //释放成功
sl.enter(ref gotlock); //进入锁
li.add(r);
if (gotlock) sl.exit(); //释放
});
console.writeline(li.count);
//输出:10000000
四:继续spinlock
new spinlock(false) 这个构造函数主要用来帮我们检查死锁用,true是开启。
开启状态下,如果发生死锁会直接抛异常的。
贴了一部分源码(已折叠),我们来看下:
public void enter(ref bool locktaken)
{
if (locktaken)
{
locktaken = false;
throw new system.argumentexception(environment.getresourcestring("spinlock_tryreliableenter_argumentexception"));
}
// fast path to acquire the lock if the lock is released
// if the thread tracking enabled set the new owner to the current thread id
// id not, set the anonymous bit lock
int observedowner = m_owner;
int newowner = 0;
bool threadtrackingenabled = (m_owner & lock_id_disable_mask) == 0;
if (threadtrackingenabled)
{
if (observedowner == lock_unowned)
newowner = thread.currentthread.managedthreadid;
}
else if ((observedowner & lock_anonymous_owned) == lock_unowned)
{
newowner = observedowner | lock_anonymous_owned; // set the lock bit
}
if (newowner != 0)
{
#if !feature_coreclr
thread.begincriticalregion();
#endif
#if pfx_legacy_3_5
if (interlocked.compareexchange(ref m_owner, newowner, observedowner) == observedowner)
{
locktaken = true;
return;
}
#else
if (interlocked.compareexchange(ref m_owner, newowner, observedowner, ref locktaken) == observedowner)
{
// fast path succeeded
return;
}
#endif
#if !feature_coreclr
thread.endcriticalregion();
#endif
}
//fast path failed, try slow path
continuetryenter(timeout.infinite, ref locktaken);
}
private void continuetryenter(int millisecondstimeout, ref bool locktaken)
{
long startticks = 0;
if (millisecondstimeout != timeout.infinite && millisecondstimeout != 0)
{
startticks = datetime.utcnow.ticks;
}
#if !feature_pal && !feature_coreclr // pal doesn't support eventing, and we don't compile cds providers for coreclr
if (cdssyncetwbclprovider.log.isenabled())
{
cdssyncetwbclprovider.log.spinlock_fastpathfailed(m_owner);
}
#endif
if (isthreadownertrackingenabled)
{
// slow path for enabled thread tracking mode
continuetryenterwiththreadtracking(millisecondstimeout, startticks, ref locktaken);
return;
}
// then thread tracking is disabled
// in this case there are three ways to acquire the lock
// 1- the first way the thread either tries to get the lock if it's free or updates the waiters, if the turn >= the processors count then go to 3 else go to 2
// 2- in this step the waiter threads spins and tries to acquire the lock, the number of spin iterations and spin count is dependent on the thread turn
// the late the thread arrives the more it spins and less frequent it check the lock avilability
// also the spins count is increaes each iteration
// if the spins iterations finished and failed to acquire the lock, go to step 3
// 3- this is the yielding step, there are two ways of yielding thread.yield and sleep(1)
// if the timeout is expired in after step 1, we need to decrement the waiters count before returning
int observedowner;
//***step 1, take the lock or update the waiters
// try to acquire the lock directly if possoble or update the waiters count
spinwait spinner = new spinwait();
while (true)
{
observedowner = m_owner;
if ((observedowner & lock_anonymous_owned) == lock_unowned)
{
#if !feature_coreclr
thread.begincriticalregion();
#endif
#if pfx_legacy_3_5
if (interlocked.compareexchange(ref m_owner, observedowner | 1, observedowner) == observedowner)
{
locktaken = true;
return;
}
#else
if (interlocked.compareexchange(ref m_owner, observedowner | 1, observedowner, ref locktaken) == observedowner)
{
return;
}
#endif
#if !feature_coreclr
thread.endcriticalregion();
#endif
}
else //failed to acquire the lock,then try to update the waiters. if the waiters count reached the maximum, jsut break the loop to avoid overflow
if ((observedowner & waiters_mask) == maximum_waiters || interlocked.compareexchange(ref m_owner, observedowner + 2, observedowner) == observedowner)
break;
spinner.spinonce();
}
// check the timeout.
if (millisecondstimeout == 0 ||
(millisecondstimeout != timeout.infinite &&
timeoutexpired(startticks, millisecondstimeout)))
{
decrementwaiters();
return;
}
//***step 2. spinning
//lock acquired failed and waiters updated
int turn = ((observedowner + 2) & waiters_mask) / 2;
int processorcount = platformhelper.processorcount;
if (turn < processorcount)
{
int processfactor = 1;
for (int i = 1; i <= turn * spinning_factor; i++)
{
thread.spinwait((turn + i) * spinning_factor * processfactor);
if (processfactor < processorcount)
processfactor++;
observedowner = m_owner;
if ((observedowner & lock_anonymous_owned) == lock_unowned)
{
#if !feature_coreclr
thread.begincriticalregion();
#endif
int newowner = (observedowner & waiters_mask) == 0 ? // gets the number of waiters, if zero
observedowner | 1 // don't decrement it. just set the lock bit, it is zzero because a previous call of exit(false) ehich corrupted the waiters
: (observedowner - 2) | 1; // otherwise decrement the waiters and set the lock bit
contract.assert((newowner & waiters_mask) >= 0);
#if pfx_legacy_3_5
if (interlocked.compareexchange(ref m_owner, newowner, observedowner) == observedowner)
{
locktaken = true;
return;
}
#else
if (interlocked.compareexchange(ref m_owner, newowner, observedowner, ref locktaken) == observedowner)
{
return;
}
#endif
#if !feature_coreclr
thread.endcriticalregion();
#endif
}
}
}
// check the timeout.
if (millisecondstimeout != timeout.infinite && timeoutexpired(startticks, millisecondstimeout))
{
decrementwaiters();
return;
}
//*** step 3, yielding
//sleep(1) every 50 yields
int yieldsofar = 0;
while (true)
{
observedowner = m_owner;
if ((observedowner & lock_anonymous_owned) == lock_unowned)
{
#if !feature_coreclr
thread.begincriticalregion();
#endif
int newowner = (observedowner & waiters_mask) == 0 ? // gets the number of waiters, if zero
observedowner | 1 // don't decrement it. just set the lock bit, it is zzero because a previous call of exit(false) ehich corrupted the waiters
: (observedowner - 2) | 1; // otherwise decrement the waiters and set the lock bit
contract.assert((newowner & waiters_mask) >= 0);
#if pfx_legacy_3_5
if (interlocked.compareexchange(ref m_owner, newowner, observedowner) == observedowner)
{
locktaken = true;
return;
}
#else
if (interlocked.compareexchange(ref m_owner, newowner, observedowner, ref locktaken) == observedowner)
{
return;
}
#endif
#if !feature_coreclr
thread.endcriticalregion();
#endif
}
if (yieldsofar % sleep_one_frequency == 0)
{
thread.sleep(1);
}
else if (yieldsofar % sleep_zero_frequency == 0)
{
thread.sleep(0);
}
else
{
#if pfx_legacy_3_5
platform.yield();
#else
thread.yield();
#endif
}
if (yieldsofar % timeout_check_frequency == 0)
{
//check the timeout.
if (millisecondstimeout != timeout.infinite && timeoutexpired(startticks, millisecondstimeout))
{
decrementwaiters();
return;
}
}
yieldsofar++;
}
}
/// <summary>
/// decrements the waiters, in case of the timeout is expired
/// </summary>
private void decrementwaiters()
{
spinwait spinner = new spinwait();
while (true)
{
int observedowner = m_owner;
if ((observedowner & waiters_mask) == 0) return; // don't decrement the waiters if it's corrupted by previous call of exit(false)
if (interlocked.compareexchange(ref m_owner, observedowner - 2, observedowner) == observedowner)
{
contract.assert(!isthreadownertrackingenabled); // make sure the waiters never be negative which will cause the thread tracking bit to be flipped
break;
}
spinner.spinonce();
}
}
从代码中发现spinlock并不是我们简单的实现那样一直自旋,其内部做了很多优化。
1:内部使用了interlocked.compareexchange保持原子操作, m_owner 0可用,1不可用。
2:第一次获得锁失败后,继续调用continuetryenter,continuetryenter有三种获得锁的情况。
3:continuetryenter函数第一种获得锁的方式。 使用了while+spinwait,后续再讲。
4:第一种方式达到最大等待者数量后,命中走第二种。 继续自旋 turn * 100次。100这个值是处理器核数(4, 8 ,16)下最好的。
5:第二种如果还不能获得锁,走第三种。 这种就有点混合构造的意味了,如下:
if (yieldsofar % 40 == 0)
thread.sleep(1);
else if (yieldsofar % 10 == 0)
thread.sleep(0);
else
thread.yield();
thread.sleep(1) : 终止当前线程,放弃剩下时间片 休眠1毫秒。 退出跟其他线程抢占cpu。当然这个一般会更多,系统无法保证这么细的时间粒度。
thread.sleep(0): 终止当前线程,放弃剩下时间片。 但立马还会跟其他线程抢cpu,能不能抢到跟线程优先级有关。
thread.yeild(): 结束当前线程。让出cpu给其他准备好的线程。其他线程ok后或没有准备好的线程,继续执行。 跟优先级无关。
thread.yeild()还会返回个bool值,是否让出成功。
从源码中,我们可以学到不少编程技巧。 比如我们也可以使用 自旋+thread.yeild() 或 while+thread.yeild() 等组合。
五:总结
本章谈了自旋锁的基础+楼主的经验。 spinlock类源码这块,只粗浅理解了下,并没有深究。
测了下spinlock和自己实现的自旋锁性能对比(并行添加1000w list<int>()),spinlock是单纯的自旋锁性能2倍以上。
还测了下lock的性能,是系统spinlock性能的3倍以上。 可见lock内部自旋的效率更高,clr暂没开源,所以看不到clr具体实现的代码。
上一篇: 理解C#中的Lambda表达式
下一篇: C#中的高阶函数介绍