| | #1 (permalink) | |
| Slaying Bad Memes | Software Risk Analysis SOFTWARE PROBABILISTIC RISK ANALYSIS (SPRA) I have finally determined what my new job is all about. They want me to come up with a way of assigning a probability to a module of software--the probability that it will execute in such a way that loss of assets will occur (LOA). This isn't Quality Assurance. This isn't Software Reliability Analysis. This isn't Software Verification and Validation. So, don't go there! You may ask, "what's the difference between SW PRA and SQA, SRA, SVV?" Excellect question. When I have a good answer, I'll get back to you. But for the nonce, just accept that it is a different animal altogether. --What is the probability that LOA will occur because of SW?-- Hardware PRA is fairly straightforward, and I've been learning a lot about that lately. But software PRA is... is... currently undefined. The best I have done so far is build the following metaphor: The "mission" takes place in "TEFO", a 4-dimensional space, with dimensions of time, environment, function and (human) operation. Knowing what you want to do, you carve out a cavern of "nominality" (nominal, expected). Knowing what you want the mission to accomplish, you settle on regions within TEFO which define the expected Environment the mission will deal with (high G's, high speeds, etc), the Functions the mission will have to perform (takeoff, navigation, communication, etc), the human Operations the people (including crew) involved will execute -- and all as a function of Time. Within TEFO, we carve out a cavern that makes room for the mission. Within the TEFO cavern, we assemble our Requirements--like a scaffolding, that defines the outer limits of our mission. Hung from that scaffolding, the actual mission is constructed out of hardware, software and people. Within the volume of the actual mission itself, we define a smaller volume--let's call it the 99% envelope--and this defines the boundaries of all our testing and verification. So, SQA, SRA and SVV all apply to that portion of the mission within the 99% envelope. A "mission" would be the actual flight. At any given instant in time, there is a point in the TEFO that defines the "mission event state". This point traces out the "mission event timeline" which starts at say, ignition, and goes all the way to say, landing. --What is the probability that LOA will occur because of SW?-- We can assume the SQA, SRA and SVV can (mostly) eliminate flaws and bugs within the 99% envelope. But SW PRA, as I currently understand it, addresses the probability that the mission event timeline will drift out of the 99% envelope, maybe even outside the requirements envelope--AND result in LOA--because of a failure of the software to adequately respond to these boundary edge conditions. NOTE: The software doesn't have to FAIL, necessarily. It could be doing exactly what it was designed to do. Remember, there are other dimensions: the Environment and the Operations of human beings. THE BIG Q: Does anyone out there know of resources, people, books, projects or have personal experience applicable to this problem??? In the meantime, I will use this thread as kind of a blog to record whatever I find out. ---------------- Hypography Forums Moderator -- - - - - - What concerns me is not the way things are, but rather the way people think things are. Epictetus, Greek Philosopher The map is NOT the territory. Korzybski, Polish-American Philosopher Last edited by Pyrotex; 12-19-2007 at 08:21 PM. | |
| ||
| | #2 (permalink) | ||
| Wedding Planner ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() | Re: Software Risk Analysis Quote:
![]() ---------------- Hypography Science Forums Moderator --- "There are no passengers on Spaceship Earth. We are all crew." - Marshall McLuhan "We must not forget that when radium was discovered no one knew that it would prove useful in hospitals. The work was one of pure science. And this is a proof that scientific work must not be considered from the point of view of the direct usefulness of it." - Marie Curie | ||
| |||
| | #3 (permalink) | |
| Resident Slayer | I know this from nothing... But if you want to gain some legitimacy, it may be worthwhile to look at the "accepted" approach for hardware, to wit: the basic idea on the hardware side is you have a countable number of components and a mean-time-before-failure and a system analysis that takes into account criticality and redundancy to determine time before system failure (ignoring whether its mission or crew for the moment: that's an implementation detail!). It would appear that "software is completely different," but I'll argue that its not: "components" in software are "program modules" and while its hard to assess specific "MTBF" to a specific module since its one of a kind, what you can do is pull out one of those weird computer science concepts that not many people pay attention to any more: the good ol' "function point." While its been a long time since I've gone near them, I'm pretty sure you can scare up statistics on "MTBF-per-function point" with appropriate analysis of the types of FPs and so on and end up using a methodology that's exactly like what's done on the hardware side. Now if I were actually an astronaut, I'd be scared to death of this kind of "analysis," ("That's a bunch of crap," is what Gus Grissom would probably say) but its probably just as justifiable as the hardware stuff is... You pulled it out of where? Buffy ---------------- "If you do not agree with anything I say, I'll not only retract it, but deny under oath that I ever said it!" __________________________________________________ ______________-- Tom Lehrer "The shrinks diagnosed me a sociopath with paranoid delusions. But they’re just out to get me cause I threatened to kill them." Forum Administrator Hypography Science Forums - Science for Boys and Girls! Its not for nothing that we hang out here. | |
| ||
| | #4 (permalink) | |
| Slaying Bad Memes | Re: Software Risk Analysis Thanks dudes and dudettes. The most famous incident where SW killed someone was the Therac 25. A software error passed through exhaustive testing, and never caused any problems until a rare sequence of operations enabled it. SW PRA is different from the other analyses because the output of SW PRA (just like HW PRA) is to assign a number -- the probability that that particular piece of SW will fail or enable a failure, and cause damage. It's not trying to fix the SW or even test it. It's not a methodology of creating good SW. It's not even about FINDING bugs or flaws. What is the probability (per mission or per hour of operation) that software will cause hurt or damage? That's it. And maybe, what is the probable number of serious bugs still existing? ---------------- Hypography Forums Moderator -- - - - - - What concerns me is not the way things are, but rather the way people think things are. Epictetus, Greek Philosopher The map is NOT the territory. Korzybski, Polish-American Philosopher Last edited by Pyrotex; 12-19-2007 at 08:23 PM. | |
| ||
| | #5 (permalink) | |||
| Wedding Planner ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() | Re: Software Risk Analysis Quote:
Quote:
![]() ---------------- Hypography Science Forums Moderator --- "There are no passengers on Spaceship Earth. We are all crew." - Marshall McLuhan "We must not forget that when radium was discovered no one knew that it would prove useful in hospitals. The work was one of pure science. And this is a proof that scientific work must not be considered from the point of view of the direct usefulness of it." - Marie Curie Last edited by freeztar; 12-18-2007 at 07:20 PM. Reason: Deletion of a rogue idea | |||
| ||||
| | #6 (permalink) | |
| Creating | How about this approach?
will be nearly 1.---------------- Moderator: Computers and Technology; Medical Science; Science Projects and Homework; Philosophy of Science; Physics and Mathematics; Environmental Studies ![]() | |
| ||
| | #7 (permalink) | ||
| Slaying Bad Memes | Re: An empirical estimation approach using a simulation Quote:
![]() Your approach and Buffy's approach to hardware (HW) risk analysis has been going on since at least the 1960's--nearly 50 years, maybe more. The empirical approach. Put 10,000 widgets in the fottasite, and run for 1,000 hours. 2,081 widgets failed. Probability of failure per hour is 2,081/10,000,000 or 0.00021 or 0.021%. If the widget has to run for 50 hours during a mission, then Total Probability of failure is about 1 in 100. But when you try to do SW that way, you can download 10,000 copies of it into a computer, and the computer HW may fail, but every copy of the SW will behave the same. They may all fail. They may all never fail. They may all fail once every 7 hours for unknown reasons. But there's no difference among identical copies of 1s and 0s, given that they are subjected to the same test. The empirical approach produces bad, nonsensical, untrustworthy or uninterpretable results. My bosses have committed that they will, for the first time ever, come up with a way of assigning a probability of failure to SW. So far, it appears to me that they have tried and failed. Now, my data may be incomplete as hell or just dead wrong. In case anybody is reading this post, I hereby disavow everything I have said so far. But I am left with the conclusion that SW PRA is not easy and not straightforward. ---------------- Hypography Forums Moderator -- - - - - - What concerns me is not the way things are, but rather the way people think things are. Epictetus, Greek Philosopher The map is NOT the territory. Korzybski, Polish-American Philosopher Last edited by Pyrotex; 12-19-2007 at 08:25 PM. | ||
| |||
| | #8 (permalink) | |||
| Resident Slayer | Re: An empirical estimation approach using a simulation Quote:
I know this has been going on for a long time, and the question is why the response to the people who are doing it is still "I dunno"... The development managers I know have been keeping statistics like this (usually in lines of code rather than FPs) for ages just to rate programmers for when promotion/layoff time comes. Even if there were more complexities than I've mentioned here (and I know there are!), there's got to be enough data to start to give "fuzzy" numbers that are probably no less "justifiable" than the "test 2000 vacuum tubes" method... Quote:
![]() Tell me you're really doing this just so they let you do Hypography at work... ![]() Work expands to fill the available time, ![]() Buffy ---------------- "If you do not agree with anything I say, I'll not only retract it, but deny under oath that I ever said it!" __________________________________________________ ______________-- Tom Lehrer "The shrinks diagnosed me a sociopath with paranoid delusions. But they’re just out to get me cause I threatened to kill them." Forum Administrator Hypography Science Forums - Science for Boys and Girls! Its not for nothing that we hang out here. | |||
| ||||
| | #9 (permalink) | ||
| Slaying Bad Memes | Re: An empirical estimation approach using a simulation Quote:
![]() ---------------- Hypography Forums Moderator -- - - - - - What concerns me is not the way things are, but rather the way people think things are. Epictetus, Greek Philosopher The map is NOT the territory. Korzybski, Polish-American Philosopher | ||
| |||
| | #10 (permalink) | ||
| Creating | Quote:
![]() Not to disparage the invention of acronyms and disciplines to accompany them (some of the high points of my career have involved the invention of acyronyms ) but a lot of definition will be necessary before anybody can understand or contribute much in detail to any conversation about it.The number and scope of questions that spring to mind in an effort to create this definition overwhelm my ability to render them very coherently, so I’ll just let fly with one: Does the software failure have to be a “sin of commission or omission” – eg: turning on/off or failing to turn on/off some piece of the spacecraft at a critical moment such that all goes boom – or are “sins of ignorance” – eg; not detecting and alarming or reacting to an anomalous situation – also grounds to conclude that failure is due to software? If this is the case, arguable any failure can be cast as a failure of software: for example, the 1986 Challenger disaster could have been made survivable had sensors and software detected and reacted to the SRB abnormalities by separating the orbiter from them prior to the explosion. If this is not the case, what is to prevent the probability of failure due to software from being made effective zero by not having any software, even though such a vehicle would almost certainly have much lower performance and much greater probability of failure due to non-software causes? ---------------- Moderator: Computers and Technology; Medical Science; Science Projects and Homework; Philosophy of Science; Physics and Mathematics; Environmental Studies ![]() | ||
| |||
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| |
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| analysis and topology | php111 | Questions and Answers | 1 | 09-01-2007 01:07 AM |
| Sample Rate Conversion Analysis | freeztar | Music studies | 0 | 06-13-2007 11:12 PM |
| "The Risk Conundrum" | Simon | Physics and Mathematics | 10 | 04-28-2007 04:47 PM |
| Transaction analysis | tarak | Political sciences | 5 | 12-13-2006 07:56 AM |
| error analysis | labview1958 | Science Projects and Homework | 0 | 03-24-2006 09:02 AM |
All times are GMT -8. The time now is 01:12 PM.
























