• Have you tried out dark mode?! Scroll to the bottom of any page to find a sun or moon icon to turn dark mode on or off!

diy solar

diy solar

Why FET BMS fail

bpaddock

New Member
Joined
Oct 9, 2021
Messages
12
In a recent video Wil made the comment that FET BMS fail often. This is why:

A common problem I see in circuits on YouTube and other sites is that someone's circuit failed, and went up in smoke.
People in the more esoteric realm's blame this on things like "Subtle Energy" overload and other such minutiae.
Here is the far more realistic explanation:

The very old GE SCR Manual Including Triacs and Other Thyristors goes into all of the gory details of what is happening inside the part, when the "Magick smoke comes out", as it is unlikely you have the Manual at hand, in a nutshell:

What lets the Magick Smoke out of IGBTS, FETS and SCRs in most cases is turn them on too slowly, causing 'Spot Heating' of the die.

Think of a FET, or SCR, as hundreds of thousands, possibly millions, of very small resistors all in parallel (Resistors in Parallel give a lower total resistance), where each one can be turned on and off individually. The 'resistors' closest to the gate turn on first, and as the gate potential spreads across the die the rest turn on. The ones farthest from the gate pin turn on last.

With a slow gate turn on, a few of the small 'resistors' nearest the gate are trying to carry all of the load, which they can't do, so they burn up, but the device does not fail quite yet. The next time the device is turned on, which may be only milliseconds away depending on your switching frequency, or days away depending on the application, some more of the resistors further in burn up. When the point is reached that there is simply not enough of the 'resistors' left to carry the load is when the Magick Smoke escapes, and the part dies a catastrophic death.

This is why the parts generally run "for a while" before failing. If it fails as soon as you fire it up the first time, you either have a catastrophic short in the load, possibly shorted caps that take a bit of time to 'wake up' before they hold a charge, generally fixed with 'Soft Start', or the gate drive really sucked big time.

There needs to be a few *Amps* of current pumped in the gate of the larger parts, in very short periods of time, to get the gate potential to spread across the entire die as fast as possible.

You also want to get the thing turned off as fast as possible.

If you are not familiar with the concept of Magick Smoke, this is where all electronic parts run on Magick Smoke, because once the smoke comes out of the part, it no longer runs...

The bottom line is that the designers need to stop cheeping out on Gate Drive and do it correctly with a lot of current in a very short time.
Then there other considerations about too much too fast that need dealt with but that is beyond what I wanted to cover here.
 
That was really interesting, thank you. I'd never really taken the time to consider why MOSFETs fail when they do, I just kind of accepted that they fail sometimes.

Do you by chance have the name of the manual or publication / isbn number? I work a lot with microelectronics and this is something I would be interested in learning more about.
 
Usually, it's the quality of the MOSFETs. If they used a name-brand, they would be more reliable. (On-Semi, Infineon, Vishay, etc)
 
quality of the MOSFETs
Not all FETs are born equal, neither are humans... The QA department in every company has certain tolerance limits set by industry standards, target markets, sales policy, etc. Cheap vendors accept high dispersion due to bad manufacturing...so 10 fets from them in parallel means far less than 10x each ones capability. Could actually be 3x or worse for the bad ones which more often than not are the QA rejects that make their way to the "gray" market or trunk sales.
Paralleling 10 or 20 or 30 mosfets is similar to firefighting using many garden hoses when you actually need just a handful of firefighting-grade ones.
Real industrial grade equipment uses only 2 or 4 strong mosfets as two-way disconnect switch, but they are big and expensive types ... so no super discounts for the eastern-asian-backshed-electronics-builders...so they are not to be found in a pizza-priced bms.
When you buy a 200A continuous BMS which obviously includes a >300A peak current SSR (solid state relay) on top of everything else a BMS has, you really don't wonder how much a dedicated, industrial grade 200A SSR costs, and why is it so much? You got such a bargain, isn't it? or...?
 
name of the manual or publication

The Manual he's talking about might be this old book. However it's super technical and unfortunately most semiconductors and techniques in there are obsolete since back then the SCR were the only high power solid state switch known. These days using high power phase cut control in residential apps or small shops would be inviting the utility company to your door for a drink.:devilish: ...or at least to change your broken smart electricity meter and ask you to stop whatever you were doing to it.
The basic principles still stand though. Protections are in chapter 15 and further.
 
Not all FETs are born equal, neither are humans... The QA department in every company has certain tolerance limits set by industry standards, target markets, sales policy, etc. Cheap vendors accept high dispersion due to bad manufacturing...so 10 fets from them in parallel means far less than 10x each ones capability. Could actually be 3x or worse for the bad ones which more often than not are the QA rejects that make their way to the "gray" market or trunk sales.
Paralleling 10 or 20 or 30 mosfets is similar to firefighting using many garden hoses when you actually need just a handful of firefighting-grade ones.
Real industrial grade equipment uses only 2 or 4 strong mosfets as two-way disconnect switch, but they are big and expensive types ... so no super discounts for the eastern-asian-backshed-electronics-builders...so they are not to be found in a pizza-priced bms.
When you buy a 200A continuous BMS which obviously includes a >300A peak current SSR (solid state relay) on top of everything else a BMS has, you really don't wonder how much a dedicated, industrial grade 200A SSR costs, and why is it so much? You got such a bargain, isn't it? or...?
Exactly!
I’m at the mindset of buy it cheap and back the hell down on the demands on it.
I have multiple JK bms’s, all rated for 200 amps but none will ever run there continuously. I believe it was member RCinFLA that had eye opening graphics on load and heat the completely changed my thinking. My combined peak load rating for bms’s is over 1000 amps but the most my system can possibly inrush is 500 amps. That being said a high continuous load is only 175 amps. Basically I’m only doing less than 30% of the continuous BMS rating, charge or discharge.
Heat and high inrush are probably the biggest killers of FETs, especially cheap ones
 
Exactly!
I’m at the mindset of buy it cheap and back the hell down on the demands on it.
I have multiple JK bms’s, all rated for 200 amps but none will ever run there continuously. I believe it was member RCinFLA that had eye opening graphics on load and heat the completely changed my thinking. My combined peak load rating for bms’s is over 1000 amps but the most my system can possibly inrush is 500 amps. That being said a high continuous load is only 175 amps. Basically I’m only doing less than 30% of the continuous BMS rating, charge or discharge.
Heat and high inrush are probably the biggest killers of FETs, especially cheap ones
sounds like what I have done more or less. I have three 300 amp BMS's and maximum continuous draw form both inverters running is only 72 amps. each line from the positive side of the batteries has an 80amp T type fuse within 10"s of the battery and from that to the buss bars.

so in theory any one of the BMS's could handle the full continuous of the house and the shop with all power draws operating and still be at roughly 1/4 of its capacity I never trust chinesium components any further than about 60% of their nameplate rating.
 
So I’m just mocking up a portable 12v work battery and wiring the power in & out of the bms. I didn’t like copper threads in the pc board and thought that I could just pass a grub screw through and put a jam nut on the back side. Then as I was cleaning off the back side of flux and solder berries so the nut can sit flush I thought why not just put lugs on both sides. I was going to go with four #6 wires on each side. That changed when I realized that my #1 lugs were bogus and couldn’t accommodate four #6 cables so I went with three #6 cables into #2 lug.

This thread got me to looking and thinking. Visually, the traces on the board and layers in between including the little copper jumpers on it have nowhere close to the copper cross section to carry very much current continuously. The only advantage it has is surface area which would become mute when the immediate area heats up. My wiring is overkill and if anything would help pull heat off the board.
Yeah, if you have JK 200 and continuous 350 surge I’d be skeptical about how much and how long you can pull it off. I’m tempted to put a pair of 1” 12 volt fans positioned end on to the bms controlled with a 150F thermal switch. IMG_1654.jpegIMG_1655.jpegIMG_1657.jpeg
 
Are the 100A BMSes used in pre-made server rack and wall batteries at a price/performance/current point where they they can last a while, given current technology?

I think wall batteries BMS are preprogrammed for lower than their max, which is why I lump them into 100A class.
 
Are the 100A BMSes used in pre-made server rack and wall batteries at a price/performance/current point where they they can last a while, given current technology?

Many of them are JDB, or some other readily available BMS like a Seplos, Pace or whatever. Some are custom. I don't believe there will be any issues - the tech is pretty mature and well understood.
 
Are the 100A BMSes used in pre-made server rack and wall batteries at a price/performance/current point where they they can last a while, given current technology?

I think wall batteries BMS are preprogrammed for lower than their max, which is why I lump them into 100A class.
Those batteries are probably capable of handling what they claim(buffer zone). As for the JK it’s probably more depending on how much luck of component quality and state of health of the assembler. Off grid garage managed to let the magic smoke out of the same model shown above when the 4-8 S first came out. It didn’t completely fail and it didn’t do an over current disconnect at rated load when it happened. Others have had some other issues on the first lots.
While I really like the JK line (non inverter) bms for cost and features, I’m self imposing the de-rating.
 
Those batteries are probably capable of handling what they claim(buffer zone). As for the JK it’s probably more depending on how much luck of component quality and state of health of the assembler. Off grid garage managed to let the magic smoke out of the same model shown above when the 4-8 S first came out. It didn’t completely fail and it didn’t do an over current disconnect at rated load when it happened. Others have had some other issues on the first lots.
While I really like the JK line (non inverter) bms for cost and features, I’m self imposing the de-rating.
i think thats the best bet for any FET based BMS at least until we get a company that builds a really high quality FET based unit.
 
interesting this is way above my pay grade... after @A.Justice gets the white papers and absorbs them maybe he can do a USMC styled coloring book to help me study... I promise not to eat the crayons until I have finished the lesson.
A good friend of mine is in the Marine Corps, and I can absolutely attest to the fact that he'd be way more interested in the crayons.
 
Biggest issue with Chinese BMS's is lack of sufficient heat sinking. Most attach the outer aluminum cover plates to plastic side of MOSFET's via thermal sponge pads. MOSFETs' current and resulting heating is not uniform across the multiple series pass MOSFET's. Can't get much worse of a heat sink arrangement.

Chinese BMS series resistance claims are usually lower than actual. PCB runners and current shunt resistance usually not included in series resistance spec. Series resistance is typically 1.5x stated spec when BMS gets hot.

At BMS rated current, MOSFET die temperature gets much higher than outside metal surface of BMS due to poor heat sinking arrangement. Die gets close to semiconductor 175°C recommended limit for long term reliability.

Turn on time is slow due to little MOSFET gate drive current to the total massive gate capacitance of multiple parallel low Rds_On MOSFET's that each have a lot of gate capacitance.

Biggest damage risk is BMS turn on with discharged inverter battery line capacitors that can have 3000 amps of peak capacitor charge current for a milliseconds. Issue is a metal fusing issue within MOSFET's due to high peak current. BMS max current trigger shutdown is not fast enough to save MOSFET's from damage due to low gate drive current.

Some BMS's have a controlled BMS turn on surge current limiter, but this is of no help if BMS is already active when external circuit breaker is flip on to uncharged inverter DC input capacitors.

Current profile for 5kVA inverter with net 14 milliohms of battery/cabling series resistance.
MPP 5kW inverter surge current with 14 milliohm battery path resistance 2.jpg
JK 16-24S active balancer BMS temp vs current.png
 
In a recent video Wil made the comment that FET BMS fail often. This is why:

A common problem I see in circuits on YouTube and other sites is that someone's circuit failed, and went up in smoke.
People in the more esoteric realm's blame this on things like "Subtle Energy" overload and other such minutiae.
Here is the far more realistic explanation:

The very old GE SCR Manual Including Triacs and Other Thyristors goes into all of the gory details of what is happening inside the part, when the "Magick smoke comes out", as it is unlikely you have the Manual at hand, in a nutshell:

What lets the Magick Smoke out of IGBTS, FETS and SCRs in most cases is turn them on too slowly, causing 'Spot Heating' of the die.

Think of a FET, or SCR, as hundreds of thousands, possibly millions, of very small resistors all in parallel (Resistors in Parallel give a lower total resistance), where each one can be turned on and off individually. The 'resistors' closest to the gate turn on first, and as the gate potential spreads across the die the rest turn on. The ones farthest from the gate pin turn on last.

With a slow gate turn on, a few of the small 'resistors' nearest the gate are trying to carry all of the load, which they can't do, so they burn up, but the device does not fail quite yet. The next time the device is turned on, which may be only milliseconds away depending on your switching frequency, or days away depending on the application, some more of the resistors further in burn up. When the point is reached that there is simply not enough of the 'resistors' left to carry the load is when the Magick Smoke escapes, and the part dies a catastrophic death.

This is why the parts generally run "for a while" before failing. If it fails as soon as you fire it up the first time, you either have a catastrophic short in the load, possibly shorted caps that take a bit of time to 'wake up' before they hold a charge, generally fixed with 'Soft Start', or the gate drive really sucked big time.

There needs to be a few *Amps* of current pumped in the gate of the larger parts, in very short periods of time, to get the gate potential to spread across the entire die as fast as possible.

You also want to get the thing turned off as fast as possible.

If you are not familiar with the concept of Magick Smoke, this is where all electronic parts run on Magick Smoke, because once the smoke comes out of the part, it no longer runs...

The bottom line is that the designers need to stop cheeping out on Gate Drive and do it correctly with a lot of current in a very short time.
Then there other considerations about too much too fast that need dealt with but that is beyond what I wanted to cover here.
Hard to agree with most of above.
Failure mode is not valid for mosfets.
Gate potential doesn't "spread around" in mosfets same manner as in thyristors and triacs.

"as fast as possible" also gets really quickly to "too much of good thing". Faster turn-off means bigger inductive kick-back and faster you do it less change TVS diodes (if any) have to clamp it totally, leading to mosfets avalanching above their maximum voltage. Dynamic current sharing can also get nasty if attempting too fast turn-on/off and it will be hugely dependent on pcb layout.

What you certainly want is ROBUST gate drive, last thing you want to happen is that the gate drive hangs or oscillates half-way between on an off.

Biduleohms diy bms thread has some good discussion on mosfet selection and turn-off avalance energy etc but it is a loooong thread:
 
I've run JKs flat out at 200A and measured temperature. Once my new lab is up and running, I'll see if we can redo some of those in a controlled environment and I'll share the results.
 
I've run JKs flat out at 200A and measured temperature. Once my new lab is up and running, I'll see if we can redo some of those in a controlled environment and I'll share the results.
Sorry, I should have put the question as 'did you use the BMS MOSFET temp readout for your observations'.

The numbers in chart are based on hottest of sixteen thermocouples mounted on MOSFET's metal tab sticking out along center row. Not every MOSFET was covered with a thermocouple since my multi-channel thermocouple meter only has 16 inputs. To get the MOSFET die temp I computed the resulting higher die temp based on package spec's thermal resistance to metal tab pedestal.

Generally, the MOSFET's closer to center of BMS end of MOSFET string are a bit hotter (opposite end of MOSFET string from thermistor location).

JK single thermistor mounting for MOSFET temp is very poor and inconsistent. Typically sticking up in the gap between MOSFET's on only one of the four groups of MOSFET's. Gooped in place with poor heat conductivity white RTV adhesive. Nearly half the units I checked showed the reported BMS MOSFET temp readout was lower than a thermocouple place on the outside surface of aluminum plate cover, which is always significantly cooler than the poor heat sunk MOSFET devices.

16S full board picture new jumper bars 3.jpg
ed
 

diy solar

diy solar
Back
Top