diy solar

diy solar

Parallel Cell Capacity Balancing (PCCB) Procedure.

Without a diagram of your 3p4s system, it is hard to assess failure modes. While you might disagree with the author of the ORION BMS white paper referenced above (I will find it and post it here),


the primary concern with parallel strings is the energy from one entire string dumping into the other due to a single failed cell. In a parallel cell configuration, this simply can't happen. Yes, one cell in parallel configuration will dump into the mate if that mate fails (assuming no other protections), but that is much less energy than in say a 2P 16S configuration for example. So if there is any possibility this needs to be avoided. At the very least one would need separate breakers and BMSs per parallel string.

There are two separate issues in BMS monitoring in comparing serial vs parallel configurations; the first is whether separate parallel string BMSs per string can detect faults that a single BMS can not and the second is what happens worst-case. The first question is really beyond the scope of what I would want to get into here. But suffice to say, that even with separate voltage measurements per cell, I could envision cell faults that remain undetected within the BMS cell and array level voltage ranges (2.5 to 3.65V). With regard to the second, there is still potentially a whole lot of fault current and more importantly total energy that could be dissipated in a parallel string failure mode within the BMS fault limits. In contrast, the worst case for parallel cells is one dumping into the other. It can't get worse than that. This is 1/16 the energy for a 16S configuration comparing parallel cells vs parallel strings.



If we are really concerned about diagnostics (fault detection) and prognostics (detecting onset of failure of accelerated end-of-life) of a LiFePO battery array, I think it is far more effective to monitor crosscurrents in a parallel cell configuration. Voltage just does not tell you that much. But this means a different topology than a standard BLS which is probably beyond most people's capability to implement. That said, with an early warning, corrective action can be taken earlier and the resulting downtime may in fact be expected to be not only less but also planned.

Most grid-tied battery backup systems have separate critical loads panels. We plan on a transfer switch that has the critical loads being run as effectively off-grid with battery backup. By means of the transfer switch, the critical circuits can be switched to grid-tied and allows for plenty of time to perform any maintenance under scheduled planning allowed by the prognostics look-ahead times.


We can think of cell balance both in series (vertically) and in parallel (horizontally). Parallel strings automatically balance the string average SOC but not at the cell level. With passive or active balancing, some level of string SOC balance will tend to bring all cells in parallel balance but that is indirect (by balancing all cells in all strings toward the same average voltage and SOC). In contrast, parallel cells will automatically balance in parallel without any BMS balancing. I don't have any experience or real data on the subject, but in my mind, it remains to be seen if even after only 5 years, matched cells remain so well balanced as to justify no form of active or passive balancing.

In summary, most of your points are quite valid, but I would counter with there are many ways to "skin a cat" (as I have proposed above), and so my main focus is on safety under an absolute worst-case scenario. At this point, I would consider worst-case to entail limiting the maximum possible energy loss under a single-point cell short scenario. To my mind parallel cells and good fire protection in a metal box is inherently the safest.
First I need to make sure we are using the same language, because I am not sure we are. 3p4s is understood to mean 3 cells are paralleled together, most often just by connecting them with bus bars, then treating that as a single cell. 4 of those in series would be 3p4s, and would use a standard 4s BMS. (3p4s means first parallel the cells, then series them)

4s3p, would mean making a battery of 4 series cells, with a normal BMS. Then taking 3 of those(with 3 BMS's), and paralleling them together. First series them, then parallel them.

I am not sure which of those you mean when you say "parallel strings" Anyway, in a 4s3p configuration, one battery (a single 4s battery) cannot empty into another because of both the fuse on that battery, and because if any cell goes out of spec the BMS will disconnect.

In the 3p4s configuration, there is no way to protect the cells in a parallel group. The BMS only sees the whole group as a single cell. If one cell shorts, there will be massive current, but the voltage could possibly still be in spec for a short time. And if it went out of spec, the cells are connected directly with buss bars, with no way to disconnect them. The whole battery would be turned off, but current between cells within the battery would still flow.

There are two separate issues in BMS monitoring in comparing serial vs parallel configurations; the first is whether separate parallel string BMSs per string can detect faults that a single BMS can not and the second is what happens worst-case. The first question is really beyond the scope of what I would want to get into here.
Let's get into it, as it is a key issue, and really quite simple. In a 4s3p configuration, no two cells are paralleled together, and every single cells has its own connection to a BMS. If any single cell goes out of spec, the BMS turns off the battery, and all current through the battery and defective cell stops. This would happen if a cell shorts, if a cell goes open, or if voltage goes out of spec. In a series circuit, current is the same throughout. So disconnecting the battery, stops everything.

in a 3p4s configuration, in fact anytime you have cells paralleled together, the BMS can not monitor the individual cells. It monitors a pair or a group of cells that are in parallel. Further, there is no capability of the BMS to stop current within that group. The BMS can disconnect the battery, but that doesn't stop current in the parallel parts, only the series part.

In the 4s3p configuration, worst case the BMS disconnects, and all current stops. In the 3p4s configuration, worst case, the BMS disconnects, there is still a short that can't be disconnected, and a fire starts.

That said, I have seen it recommended and I explored using fuses between parallel cells. That introduces a whole bunch of other issues though. I have never seen it done in practice. But if you want the absolute safest and are going to use a parallel first configuration, that is what you would do.
 
Reading the Orion document now, btw.

The issues with eddy currents, I think are much less serious than what they conclude.

In any case, I think 4s3p is safer, but still chose 3p4s for myself. I consider both very safe and both to work well. I don't think you will go wrong either way.
 
Reading the Orion document now, btw.

The issues with eddy currents, I think are much less serious than what they conclude.

In any case, I think 4s3p is safer, but still chose 3p4s for myself. I consider both very safe and both to work well.
This was discussed somewhere before, mesh currents may be more appropriate, as at least that is how you would analyze the current sharing/equalization between parallel pairs. Eddy current seems to imply some circular paths of current and/or with current directional changes. I don't think that there are any direction changes other than when you go from charging to discharging at the array level and back.

upnorthandpersonal​

has suggested that most of the topology/features described (using diodes and mechanical relays) in the white paper already exist in a modern BMS. I would say close but not under a failure modes analysis. A relay will fail differently from the FET that it is intended to model. Fets fail 90% of them in a short whereas relays have different failure modes (burned coil, welded contacts, burned contacts etc). The Diode is used to model the FET Body diode which is reasonable.

I don't think you will go wrong either way.

Yes I think it is difficult to find a convincing argument that one is much safer without entertaining some yet to be invented current detection circuitry and an array of I2C/SPI microcontrollers
 
Last edited:
This was discussed somewhere before, mesh currents may be more appropriate, as at least that is how you would analyze the current sharing/equalization between parallel pairs. Eddy current seems to imply some circular paths of current and/or with current directional changes. I don't think that there are any direction changes other than when you go from charging to discharging at the array level and back.

upnorthandpersonal​

has suggested that most of the topology/features described (using diodes and mechanical relays) in the white paper already exist in a modern BMS. I would say close but not under a failure modes analysis. A relay will fail differently from the FET that it is intended to model. Fets fail 90% of them in a short whereas relays have different failure modes (burned coil, welded contacts, burned contacts etc). The Diode is used to model the FET Body diode which is reasonable.



Yes I think it is difficult to find a convincing argument that one is much safer without entertaining some yet to be invented current detection circuitry and an array of I2C/SPI microcontrollers
You bring up another point, FET vs Contactor. I was reluctant to buy a FET BMS as first, but did because of less complexity and cost. I am now a fan. Yes, FETS will usually fail in a short. But kept within operating parameters, they are more reliable than their mechanical counterparts. They are also very fast. The overkill/JBD BMS has been tested with multiple shorts, and the FETS turn off before any fuse will blow, stopping current before any damage. I have tested this on my own. I doubt a BMS that uses a relay could do this. You would also need a large and expensive relay to be able to break 500A of DC.

It is also a simple matter to perform a monthly test of the FET devices.

As an aside for how fast protection devices can be. Part of my job is maintaining high powered transmitters(>10,000W). Some of them operate on 36kV. A failure at that voltage can cause a over $100k of damage. Specification for the crowbar circuit is that if there is a direct short, the crowbar needs to remove power before that 36kV will damage a 36AWG wire (not much thicker as a human hair) That is exactly how we test it. We have a fixture with about 2 feet of 36AWG wire. We connect it directly to the amplifier power supply. A foot switch with a 30 ft air hose activates a plunger to close the circuit. If if fails, there is a spectacular show as the wire instantaneously vaporizes and splatters metal all over the inside of the fixture. If is passes, the transmitter simply turns off (very quickly) The crowbar device in this case is a thyratron, they fail and need replaced every few years.

Over the past few decades, relay logic as been replaced by FETs in the transmitters. The relays required frequent replacement, and we always had a dozen on hand. The FETS seem to last forever. Granted, the high power FETs in the final amplifier stage do fail, but those are pushed to the edge of their parameters, and still last a long time.

For very large currents (engine starting etc.) A relay is probably saver. But for under 150A or so, I think a FET would last longer.
 
For very large currents (engine starting etc.) A relay is probably saver. But for under 150A or so, I think a FET would last longer.
You need both.
My point was in a failure analysis you have to replace with actual parts in the circuit.
The primary failure FET mode seems to be ESD or other load dump events blowing out the gate. You can design against that as well
Not sure about the relevance of the highspeed crowbar performance, but it is nevertheless interesting.

In engineering reliability analysis you use MTBF and typically MIL_HDBK-217.

You have various derating factors for the environment. Ground Fixed (vs say for example Airborne Uninhabited Fighter)is used in this analysis for a Power Supply.

Once environmental factors are considered, the basic failure rate of 12 FIT has increased to 2,131.2 FIT. As MTTF is the inverse to failure rate, this results in a MTTF for the power MOSFET of about 470,000 hours. The initial MTTF however, without calculating the environmental and thermal factors, was about 83,000,000 hours.

470K Hours MTBF is 53 years
83000K Hours MTBF is 9474 years

But remember MTBF means if this was a fault that continued to catastrophic failure it is likely 50% probability your house will have burned down. The point is while a FET is very reliable, the consequence and therefore the risk is still very high.

IIRC MIL-HDBK-217 is largely environmental vibration and temperature and does not consider electrical stresses of ESD/load dump. These are the three primary degraders of electronics.

 
The eddy current thing is apparently an terrible misnomer used to describe current flowing between parallel cells within a pack due to different cell voltages and results in some (but not significant) passive balancing.

Or that's what I gathered last time it came up and I got butthurt about the term being used improperly.
 
You need both.
My point was in a failure analysis you have to replace with actual parts in the circuit.
The primary failure FET mode seems to be ESD or other load dump events blowing out the gate. You can design against that as well
Not sure about the relevance of the highspeed crowbar performance, but it is nevertheless interesting.
It was just interesting. It isn't really relevant.

In engineering reliability analysis you use MTBF and typically MIL_HDBK-217.

You have various derating factors for the environment. Ground Fixed (vs say for example Airborne Uninhabited Fighter)is used in this analysis for a Power Supply.

Once environmental factors are considered, the basic failure rate of 12 FIT has increased to 2,131.2 FIT. As MTTF is the inverse to failure rate, this results in a MTTF for the power MOSFET of about 470,000 hours. The initial MTTF however, without calculating the environmental and thermal factors, was about 83,000,000 hours.

470K Hours MTBF is 53 years
83000K Hours MTBF is 9474 years

But remember MTBF means if this was a fault that continued to catastrophic failure it is likely 50% probability your house will have burned down. The point is while a FET is very reliable, the consequence and therefore the risk is still very high.
But will a FET failure ever result in a fire? FETs are used to turn of current to protect a battery from over charge/discharge. They protect the cells. Overcurrent that causes a fire is usually protected by a fuse. I suppose there is a case where a FET fails to stop charging, and a battery overcharges to the point if overheats and starts a fire. But that is a multiple failure situation, as the charger would have failed as well. A charger should stop well before a FET disconnects. And even if the battery is very far out of balance and a FET fails to stop charging, that probably wouldn't overcharge the out of balance cell enough to start a fire. One of the really nice features of the Orion BMS that is missing on FET BMS is that a relay output can connect to an alarm. So, there can be an alarm BEFORE the disconnect, or also if there is a disconnect, but a battery continues to overcharge. So you have another level of protection in the event of a failure.

And relays fail too. Not usually by welding closed, but the kind of high current that would start a fire is also the kind of current that would do that. There are also other safety considerations. My installation is in a boat. Obviously a fire is a huge concern on a boat. But a failure of a relay due to a coil, or contact damage that presented as an open could also be life threatening as I could lose life saving equipment at a critical time. Imagine losing control of an engine, or of electronic navigation equipment at a critical time. And those are the types of failures will be much more common with a relay.

A proper analysis is well beyond me. Interesting to think about.

It is also interesting to compare MTBF. When I was in high school decades ago, I was reading about MTBF of spinning hard drives. The MTBF of each individual part was on the scale of a million hours. And some how they mathematically calculated the MTBF of the HDD built from all of those parts to be a few 100k hours. And yet, very very few hard drives every last that long. MTBF was a statistical model based on reliability of individual parts in a controlled environment. It had little to do with what actually happened when those parts were in an assembly in real world conditions. I am sure the marketing department had some influence as well.



IIRC MIL-HDBK-217 is largely environmental vibration and temperature and does not consider electrical stresses of ESD/load dump. These are the three primary degraders of electronics.

 
But will a FET failure ever result in a fire?
This is the subject of what is called in engineering Failure Modes and Effects Analysis (FMEA). You are scratching the surface here in trying to answering the question"What happens if this part fails?". This is why I caution and criticized the ORION document because it was showing a relay equivalent for a FET and calling it an electrical engineering document by real electrical engineers (or something to that effect). In fact, it is more of an educational document for non-technical people to understand the issues of cell arrays. The motivation is to let real engineers, engineer systems, and for non-technical people just buy the finished product (theirs).

I can't fault the authors because I'm sure they are aware of the basic reliability engineering, but they have been forced to dumb it down to where it becomes counterproductive. But again (in my opinion) the intent is to steer you away (if you are not qualified) from designing a system not tell you how to design a system. You need a degree and years of experience doing actual design work.

MTBF and MIL-HNBK-217 are answering the question "Given environmental deratings, what is the likelihood of a component failure?" These are complementary analyses.
MTBF was a statistical model based on reliability of individual parts in a controlled environment. It had little to do with what actually happened when those parts were in an assembly in real world conditions. I am sure the marketing department had some influence as well.
The environments are defined but not controlled in the sense that (I think you mean) as benign. Depending upon temperature extreme and mechanical vibration environments the MTBFs are derated from a true "controlled" and benign environment. See the calculations I highlighted and reread that reference. Yes, it is statistical and is intended to estimate failures across a population of devices.

We could debate the "philosophy of engineering", but nevertheless, this is how formal engineering processes have evolved for better or for worse.
 
Last edited:
Back
Top