Bender & Associates Inc.
46 Digital Drive, Suite 5
Novato, CA 94949
(415) 884-4380
(415) 884-4384 fax
rbender@softtest.com
April 1996
(Revision #4)
KPA REVIEW GROUP
The following have been gracious enough to be reviewers of this proposed Testing KPA. I want to thank them for their insights and contributions. However, any problems or omissions the reader may find with this document I take full responsibility for. This is very much a work in progress. Please feel free to contact me with suggestions for improving it.
Boris Beizer - Independent testing consultant
Greg Daich - STSC
Dave Gelperin - SQE
Bill Hetzel - SQE
Capers Jones - SPR
John Musa - ATT
William Perry - QAI
Robert Poston - IDE
The original version of the Evaluation and Testing KPA was sponsored by Xerox Corporation. They have graciously allowed us to distribute it to the software community. The key contact is:
David Egerton
800 Phillips Road
Building 129
Webster, NY 14580
(716) 422-8822
TABLE OF CONTENTS
1. INTRODUCTION .........................
2. DEFINING EVALUATION AND TEST ........
3. THE JUSTIFICATION FOR A SEPARATE EVALUATION AND TEST KPA ............................
3.1 Accelerating Cultural Change ............
3.2 The Role Of Evaluation And Test In Project Tracking .....
3.3 Evaluation and Test As A Percentage Of The Project Costs ...........................
3.4 Impact Of Evaluation and Test On Development Schedules And Project Costs ..............
3.5 The Cost Of Defects .........
4. THE PROPOSED SOFTWARE EVALUATION AND TEST KPA ..........................
4.1 Goals .......
4.2 Commitment To Perform .
4.3 Ability To Perform .......
4.4 Activities Performed ..
4.5 Measurement And Analysis ...........................
4.6 Verifying Implementation .....................
5. RECONCILING WITH THE EXISTING CMM KPAs ...............
5.1 Leveling The Evaluation And Testing KPA Within The CMM .......
5.2 Repackaging Suggestions For The Existing KPAs ...........................
1. INTRODUCTION
本文目的是給出一種 Evaluation and Test 成為 SEI CMM KPA 的方案。文檔第一部分界定評估和測試內涵的范圍。第二部分則解釋為什么要開發這個獨立的 KPA 。第三部分給出該 KPA 提案,包括概念、目標,執行承諾、執行能力、執行活動、度量和分析以及實施驗證。本文的最后將考慮如何與其他 KPA 集成,這包括確定該 KPA 的級別,以及對現有 KPA 的重新調整。
The objective of this document is to present a proposal that Evaluation and Test become a Key Process Area (KPA) in the SEI Capability Maturity Model (CMM). The first section addresses the scope of what is meant by evaluation and test. The second section identifies the justifications for making this a separate KPA. The third section presents the proposed KPA definition including: definition, goals, commitment to perform, activities performed, measurements and analysis, and verifying implementation. The final section addresses integrating this KPA with the existing KPAs. This includes identifying which level to assign it to and some repackaging suggestions for existing KPAs.
關于“驗證”和“確認”在 ISO9000 中有嚴格的定義。
2 驗證:通過檢查和提供客觀證據,表明規定要求已經滿足的認可!膀炞C”強調的是“規定規格要求”
2 確認:通過檢查和提供客觀證據,表明一些針對某一特定預期用途的要求已經滿足的認可!按_認”強調的是“預期用途的要求”。
目的:
2 驗證的目的是證實設計階段輸出是否確保設計階段輸入要求;
2 確認的目的是通過產品確認設計是否滿足使用要求。
對象:
2 驗證的對象是設計輸出文件,計算書或樣品等;
2 確認的對象是最終產品(樣品)。
參與人員:
2 驗證的參與人員通常是設計部門;
2 確認的參與人員必須包括使用者或能代表使用要求的人員。
時機:
2 驗證的時機是設計適當階段,一般是設計階段輸出形成結果時;
2 確認的時機是成功的設計驗證后,一般針對最終產品,也可分階段確認。
2. DEFINING EVALUATION AND TEST (評價與測試的定義)
(Verification and Validation)
Evaluation is the activity of verifying the various system specifications and models produced during the software development process. Testing is the machine based activity of executing and validating tests against the code. Most software organizations define evaluation and test very narrowly. They use it to refer to just the activities of executing physical test cases against the code. In fact, many companies do not even assign testers to a project until coding is well under way. They further narrow the scope of this activity to just function testing and maybe performance testing.
評價是對軟件開發過程中產生的各種系統規格和模型進行的驗證活動。測試則是一種基于機器的對代碼執行測試,確認測試的活動。大部分組織對評價和測試的定義都相對狹義,一般是指對代碼執行物理測試用例的活動。事實上,很多公司甚至直到編碼已經開始時才指定 / 安排測試人員。更有甚者,他們將這一活動的范圍僅僅限于功能測試,也許有時做一下性能測試。
This view is underscored (被強調) in the description of evaluation and test in the current CMM. It is part of the Software Product Engineering KPA. The activities in this KPA, activities 5, 6, and 7, only use code based testing for examples and only explicitly mention function testing. Other types of testing are euphemistically (婉轉的 , 委婉說法的) referenced by the phrase “...ensure the software satisfies the software requirements”.
這種觀點在目前的 CMM 有關 evaluation and test 的描述中被進一步強調,這就是 SPE ,軟件產品工程 KPA 。在 SPE KPA 活動中,活動 5 、 6 、 7 ,僅僅用了基于代碼的測試作為 examples ,只明確地提到了功能測試。其他類型的測試只是用一句非常含糊的話來指代: “….. 保證軟件滿足軟件需求 ” 。
People who build skyscrapers, on the other hand, thoroughly integrate evaluation and test into the development process long before the first brick is laid. Evaluations are done via models to verify such things as stability, water pressure, lighting layouts, power requirements, etc. The software evaluation and test approach used by many organizations is equivalent to an architect waiting until a building is built before testing it and then only testing it to ensure that the plumbing and lighting work.
另一方面,建造摩天大廈的人們,則遠在砌第一塊磚之前將評價和測試集成到了開發過程之中。通過建模來驗證穩定性、防水性、照明布置以及電源的需求等等從而實施評價。而目前,組織所使用軟件評價和測試方法就像是設計師一直等到大樓已經建成才進行測試,而此時的測試只是能保證給水和照明可以工作而已。
The CMM further compounds (混合) the limited view of evaluation and test by making a particular evaluation technique, peer reviews, its own KPA. This implies that prior to the delivery of code the only evaluation going on is via peer reviews and that this is sufficient. The steps in the evaluation and test of something are: define the completion/success criteria, design cases to cover this criteria, build the cases, perform/execute the cases, verify the results, and verify that everything has been covered. Peer reviews provide a means of executing a paper based test. They do not inherently provide the success criteria nor do they provide any formal means for defining the cases, if any, to be used in the peer review. They are also fundamentally subjective. Therefore, the same misconceptions that lead a programmer to introduce a defect into the product may cause them to miss the defect in the peer review.
CMM 只是進一步將評價和測試的部分思想進行融合,用一個特殊的評價技術來代替,這個技術就是 CMM 中的一個 KPA ,同行評審。這 (CMM 設計者們的這種做法 ) 也意味著,在提交代碼之前,唯一可干的評價就是同行評審,且已經足夠了。事實上,對于一件事情的評價和測試的步驟包括: (1) 定義完成 / 成功準則; (2) 涉及覆蓋這些準則的用例; (3) 執行用例; (4) 驗證結果,驗證所有的內容都已覆蓋。同行評審只是提供了一個基于紙面的測試機制。它既不能從根本上提供成功準則,也不能提供任何正式的機制以支持用例定義以用于同行評審中。同行評審本質是主觀的,因此,基于誤解使程序員將缺陷引入產品,而到同行評審時,基于同樣的誤解,也使得人們無法發現這些 defect 。
A robust scope for evaluation and test must encompass every project deliverable at each phase in the development life cycle. It also address each desired characteristic of each deliverable. It must address each of the evaluation/testing steps. Let's look at two examples: evaluating requirements and evaluating a design.
評價和測試的一個相對堅固的內涵范圍必須包括項目在開發周期每一個階段的每一個交付產品。它也必須考慮每個交付產品的每一個預期特性。而且必須包括每一個評價 / 測試步驟。下面我們看兩個例子:評價需求和對一個設計的評價。
A requirements document should be complete, consistent, correct, and unambiguous. One step is to validate the requirements against the project/product objectives (i.e., the statement of “why” the project is being done). This ensures that the right set of functions are being defined. Another evaluation is to walk use-case scenarios through the functional rules, preferably aided by screen prototypes if appropriate. A third evaluation is a peer review of the document by domain experts. A fourth is to do a formal ambiguity review by non-domain experts. (They cannot read into the document assumed functional knowledge. It helps ensure that the rules are defined explicitly, not implicitly.) A fifth evaluation is to translate the requirements into a Boolean graph. This identifies issues concerning the precedence relationships between the rules as well as missing cases. A sixth is a logical consistency check with the aid of CASE tools. A seventh is the review, by domain experts, of the test scripts derived from the requirements. This 揵 ite-size” review of the rules often uncovers functional defects missed in reviewing the requirements as a whole.
一個需求文檔必須是完備的、一致的、正確的和清晰的。那么第一步就是基于項目 / 產品目標(即為什么要做這個項目的說明)對需求進行確認。這能夠保證我們定義了正確的功能集。下一步評價就是遍歷 use-case 腳本走查各功能規則,如果可能的話,最好用 screen prototype (可視原型、屏幕原型?)來作為輔助工具。第三步評價是有領域專家進行的對文檔的同行評審。第四步是由非領域專家進行的正式的含糊性評審(他們無法讀懂文檔里的功能知識,這將幫助確保各種規則是明確定義的,而不是隱含定義)。第五步評價是將需求轉換為布爾邏輯圖。這可以鑒別規則之間的順序問題,同時也能發現漏掉的用例 (cases) 。第五步評價是在 CASE 工具的輔助下進行的邏輯一致性檢查。第七步評價是由領域專家進行的對測試腳本的評審,這些腳本是從需求導出來的。這種“ ?ite-size ”般的對規則的評審經常能夠發現在需求評審中漏掉的功能缺陷。
Evaluating a design can also take a number of tacks. One is walking tests derived from the requirements through the design documents. Another is building a model to verify design integrity (e.g., a model built of the resource allocation scheme for an operating system to ensure that deadlock never occurs). A third is building a model to verify performance characteristics. A fourth is comparing the proposed design against existing systems at other companies to ensure that the expected transaction volumes and data volumes can be handled via the configuration proposed in the design.
對設計的評價一樣可以進行一系列補救。一個是基于從需求到處的測試對設計文檔進行走查。另一評價是構建一個模型來驗證設計的完整性(例如構造一個操作系統的資源分配 schema( 模式 ) 來保證不會發生死鎖)。第三種評價是建立模型來驗證性能特征。第四種是將形成的設計與其他公司的現成系統進行對比,以確保所設計的配置能夠處理預期的處理規模和數據規模。
Only some of the above evaluations were executed via peer reviews. None of the above were code based. Neither of the above examples of evaluation was exhaustive. There are other evaluations of requirements and designs that can be applied as necessary. The key point is that a deliverable has been produced (e.g., a requirements document); before we can say it is now complete and ready for use in the next development step we need to evaluate it for the desired/expected characteristics. Doing this requires more sophistication than just doing peer reviews.
上面的評價只有一部分可以用同行評審來完成,沒有一個是基于代碼的。而且上邊的例子中沒有一個評價是窮盡的,必要時我們可以進行的其他評價。核心關鍵是我們生產一個交付產品(如需求文檔),在我們能夠正式稱它是完備并可被下一開發步驟使用之前,我們必須基于預期 / 期望的特征對之進行評價。而進行這些評價需要比進行同行評審更加復雜的技術。
That is the essence of evaluation and test. A pre-defined set of characteristics, defined as explicitly as possible, is validated against a deliverable. For example, when you were in school and took a math test the instructor compared your answers to the expected answers. The instructor did not just say they look reasonable or they're close enough. The answer was supposed to be 9.87652. Either it was or it was not. Also, the instructor did not wait until the end of the semester to review papers handed in early in the course. They were tested as they were produced. With the stakes so much higher in software development, can we be any less rigorous and timely?
這就是評價和測試的核心關鍵。一個特征的預定義集合,盡可能被明確定義,用來對一個交付產品來進行確認。例如,當你在學校,進行了數學測驗,老師會拿你的回答與預期答案相對比。老師不會僅僅說他們看上去也是合理的,或者他們更加準確。答案是 9.87652 ,要么它對,要么不對。同時,老師也不會等到學期結束才將在課程早期交上來的進行判卷,在他們做出來之際就得到了測試。目前我們軟件開發承擔更加生死攸關的風險,難道我們還可以有任何的不嚴格和不及時嗎?
Among the items which should be evaluated and tested are Requirements Specifications, Design Specifications, Data Conversion Specifications and Data Conversion code, Training Specifications and Training Materials, Hardware/Software Installation Specifications, Facilities Installation Specifications, Problem Management Support System Specifications, Product Distribution Support System Specifications, User Manuals, and the application code. Again this is not a complete list. The issue is that every deliverable called for in your project life cycle must be tested.
這些應當進行評價和測試的交付產品應當包括 SRS ,設計規格、數據轉換規格和數據轉換代碼、培訓規格和培訓資料、硬件 / 軟件安裝規格、設備 / 工具安裝規格、穩地管理支持系統規格、產品發布支持系統規格、用戶手冊和應用程序代碼等等。當然這并不是一個完整的列表。問題是你的項目生命周期中的每一個交付產品都必須被測試。
The evaluation and test of a given deliverable may span multiple phases of the project life cycle. More and more software organizations are moving away from the waterfall model of the life cycle to an iterative approach. For example, a Design Specification might be produced via three iterations. The first iteration defines the architecture - is it manual or automated, is it centralized or distributed, is it on-line or batch, is it flat files or a relational data base, etc. The second iteration might push the design down to identifying all of the modules and the inter-module data path mechanisms. The third iteration might define the intra-module pseudo-code. Each of these iterations would be evaluated for the appropriate characteristics.
對于一個給定交付產品的評價和測試可能會延續項目生命周期的多個階段。越來越多的軟件組織開始從瀑布式模型向迭代式模型轉變。例如,設計規格可能會經過三個迭代才能產生。第一個迭代定義體系結構 — 它是人工的還是自動的,是集中的還是分散的,是在線的還是批命令式的,是直接文件存儲還是通過關系性數據庫等等。第二個迭代則可能繼續推動設計,來鑒別所有的模塊和模塊間的數據交換機制。第三個迭代則定義模塊內部的偽代碼。每個迭代都應當基于適當的特性來進行評價。
The types of evaluation and test must be robust. This includes, but is not limited to, verifying functionality, performance, reliability-availability-serviceability, usability, portability, maintainability, and extendibility.
評價和測試的類型必須是魯棒的、堅固的。這包括對功能、性能、可靠性 - 可用性 / 實用性 - 可服務性、易用性、可移植性、可維護性和可擴展型的驗證,但絕不僅限于此。
In summary, each deliverable at each phase in its development should be evaluated/tested for the appropriate characteristics via formal, disciplined techniques.
總之,每個階段的每個交付產品必須通過正式的、訓練有素的技術來對適當的屬性進行評價和測試。
3. THE JUSTIFICATION FOR A SEPARATE EVALUATION AND TEST KPA
There are five significant reasons which justify having a separate Evaluation and Test KPA: evaluation and test's role in accelerating the cultural change towards a disciplined software engineering process, the role of evaluation and test in project tracking, the portion of the development and maintenance budget spent on evaluation and test, the impact of evaluation and test disciplines on the time and costs to deliver software, and the impact of residual defects in software.
由五個重要方面能說明必須有一個獨立的 Evaluation and Test KPA ,即: (1) 評價和測試在促進向有紀律的軟件工程過程過程的文化轉變中的作用; (2) 評價和測試在項目跟蹤中所起的作用; (3) 整個開發和維護在評價和測試部分的預算; (4) 評價和測試訓練對軟件交付時間和成本方面的影響; (5) 評價和測試對軟件殘余缺陷的影響。
3.1 Accelerating Cultural Change (促進文化改變)
Electrical engineers and construction engineers are far more disciplined than software engineers. Electrical engineers produce large scale integrated circuits at near zero defect even though they contain millions of transistors. What is often lost in the widely discussed defect in the Pentium processor is that it was one defect in 3,100,000 transistors. When was the last time you saw software which had only one defect in 3,100,000 lines of code? The hardware engineers do not achieve better results because they are smarter than the software engineers. They achieve quality levels orders of magnitude higher than software because they are more disciplined and rigorous in their development and testing approach. They are willing to invest the time and effort required to ensure the integrity of their products. They recognize the impact that defects have, economic and otherwise.
電子工程師和建筑工程師要遠比軟件工程師們訓練有素。電子工程師們可以制造近乎 0 缺陷的包含上百萬個晶體管的大規模集成電路。在有關 Pentium 處理器的熱烈的缺陷聲討中,經常被忽略的是 310 萬個晶體管中竟然只有一個缺陷。那么好了,再看看軟件,你上次看到的在 310 萬行軟件代碼中只有一個缺陷是什么時候?硬件工程師們沒有繼續達到更好結果是因為他們比軟件工程師們更加 smart 。他們達到的質量水平幅度遠遠高于軟件,因為他們更加訓練有素,他們的開發和測試方法更加嚴格。他們愿意話更多的時間和精力來保證產品的完整性。他們真正認識到了缺陷所帶來的影響:經濟的或者其他的。
Construction engineers face similar challenges in constructing sky scrapers. In their world a “system crash” means the building collapsed. In regions of the world which have and enforce strict building codes that just does not happen. Again, this can be traced to the discipline of their development and testing approach.
建筑工程師在建造摩天大樓面臨著同樣的挑戰。在他們的世界中,系統坍塌意味著建筑倒塌。 In regions of the world which have and enforce strict building codes that just does not happen. 同樣,這可以追溯到他們開發和測試策略的紀律性。
Software, on the other hand, is a different matter. Gerald Weinberg's statement that “if builders built buildings the way software people build software, the first woodpecker( 啄木鳥 ) that came along would destroy civilization” is on the mark.
然而,另一方面,軟件卻是完全不同的方式。 Gerald Weinberg 的描述很著名:如果建筑師們也像軟件工程使開發軟件那樣來建造大樓,來的第一個啄木鳥就將摧毀文明。
We have to recognize that the software industry is very young as compared to other engineering professions. You might say that it is fifty years old, if you start with Grace Hopper as the first programmer. (A bit older if you count Ada Lovelace as the first.) However, a more realistic starting date is about 1960. That is just over thirty five years. By contrast, the IEEE celebrated their 100th anniversary in 1984. That means that in 1884 there were enough electrical engineers around to form a professional society. In 1945, by contrast, Ms. Hopper would have been very lonely at a gathering of software engineers.
我們不得不承認軟件工業相對于其他工程專業還十分年輕。如果你從 Grace Hopper 作為第一個編程人員的話,你可能會說它才僅僅 50 歲 ( 當然如果你將 Ada Lovelace 作為第一個的話,可能會所謂大一點 ) 。然而,更加切實的開始日期應當在 1960 年左右,也就是說我們軟件工業也不過 30 多年。做個對比, IEEE 在 1984 年慶祝其成立 100 周年,這意味著到 1884 年,已經有大量的電子工程師,從而形成一個專業協會。而在 1945 年, Mr. Hopper 則在聚集軟件工程師方面還十分孤獨。
As a further contrast construction engineering goes back over 5,000 years. The initial motivation for creating nations was not self defense; it was the necessity to manage large irrigation construction projects. We even know the names of some of these engineers. For example, in 2650 BC Imhotep is the chief engineer for the step pyramid of Djoser (aka Zoser) in Egypt. In fact he did such a good job they made him a god.
The electrical engineers and construction engineers did not start out with inherently disciplined approaches to their jobs. The discipline evolved over many years. It evolved as they came to understand the need for discipline and the implications of defects in their work products. Unfortunately, we do not have thousands of years or even a hundred years to evolve the software profession. We are already building business critical and safety critical software systems. Failures in this software is causing major business disruptions and even deaths at an alarmingly increasing rate. (See “Risk To The Public” by Peter Neumann.)
Moving the software industry from a craftsman approach to a true engineering level of discipline is a major cultural shift. The objective of the CMM is, first and foremost, a mechanism for inducing this cultural change for software engineers. However, a culture does not change voluntarily unless it understands the necessity for change. It must fully understand the problems being solved by evolving to the new cultural paradigm. [1] This, finally, brings us to the role of testing in accelerating the cultural change to a disciplined approach (I know you were beginning to wonder when I would tie this together).
將軟件工業從一種手工 ( 藝 ) 匠方法向真正的訓練有素的工程層次邁進實在是一種文化的轉折、躍變。 CMM 的首要的而且也是最重要的目標是,建立一種機制來對軟件工程是引進文化改變。但是一個文化不可能發生激烈的改變,除非你深刻理解改變的重要性。必須全面理解向新的文化改變所能給我們解決的問題。最后這一點,將使我們引導我們來討論測試在這一加速向訓練有素的文化改變中所起的作用。
In the late 1960's, IBM was one of the first major organizations to begin installing formal software engineering techniques. This began with the use of the techniques espoused (支持) by Edsger Dijkstra and others. Ironically (有諷刺意味的是) , it was not the software developers who initiated this effort. It was the software testers. The initial efforts were started in the Poughkeepsie labs under a project called “Design for Testability” headed by Philip Carol.
在 1960 年代后期, IBM 是第一批開始應用正式軟件工程技術的組織之一。一開始使用的是 Dijkstra 支持的技術。具有諷刺意味的是,并不是由軟件開發人員發起這項努力的,而是軟件測試人員。這一創始性工作是在 Poughkeepsie 實驗室進行的,屬于 Philip Carol 領導的面向測試的設計項目。
Phil was a system tester in the Software Test Technology Group. This group was responsible for defining the software testing techniques and tools to be used across the entire corporation. Nearly thirty years ago they began to realize that you could not test quality into the code. You needed to address the analysis, design, and coding processes as well as the testing process. They achieved this insight because as testers they thoroughly understood the problem since testing touches all aspects of software development. Testers inherently look for what is wrong and try to understand why.
Phil 是軟件測試技術工作組 (SW Test Technology Group) 的一個系統測試工程師。這個工作組主要負責定義軟件測試技術和工具以用于整個公司。大概在 30 年以前,他們就開始意識到你不可能通過測試將質量注于代碼中。你需要像考慮測試過程一樣也得考慮分析、設計和編碼過程。作為測試人員,由于測試需要接觸軟件開發的所有方面,他們對問題有更加徹底深入的理解,因而他們取得了這一深入洞察 (insight) 。
It was this understanding of the problem and the ability to articulate the problem to developers that allowed for a rapid change in the culture. As improved development and test techniques and tools were installed, the defect rate in IBM's OS operating system dropped by a factor of ten in just one release. This is a major cultural shift occurring in a very short time, especially given that it involved thousands of developers in multiple locations.
正是這一對問題的深入認識并將這一問題明確有力地向開發人員指出推動了軟件開發文化的迅速改變。隨著改進的開發和測試技術的應用, IBM 的 OS 操作系統的缺陷率在下一個發布降低了 1/10 。這確實是在短時間內產生的重要的文化變革,特別是這涉及到了分布在不同地域的近千名軟件開發人員。
The rapidity of the change was aided by another factor related to testing in addition to the problem recognition. This was the focused feedback loop inherent in integrating the testing process with the development process. As the development process was refined, the evaluation and test process was concurrently refined to reflect the new success criteria. As developers tried new techniques they got immediate feedback from testers as to how well they did because the testers were specifically validating the deliverables against the new yardstick.
這種變化的加速除了對問題的重視的直接推動外,另一個推動因素是與測試有關的一些因素,即在測試過程和開發過程集成中的反饋環。隨著開發過程的不斷改進,評價和測試過程并行地改進以反映新的成功準則。隨著開發不斷使用新技術,他們直接從測試人員那里得到及時的反饋 --- 他們究竟做的怎么樣 ----- 因為測試人員就是專門來基于新的尺度對交付產品進行確認的。
A specific example is the installation of improved techniques for writing requirements which are unambiguous 明確的 , deterministic 確定的 , logically consistent 邏輯上是一致的 , complete 完備的 , and correct 正確的 . Analysts are taught how to write better requirements in courses on Structured Analysis and courses in Object-Oriented Analysis. If ambiguity reviews are done immediately after they write up their first functional descriptions, the next function they write is much clearer out of the box. The tight feedback loop of write a function, evaluate the function, accelerates their learning curve. Fairly quickly the process moves from defect detection to defect prevention - they are writing clear, unambiguous specifications.
一個具體的例子是需求撰寫改進技術的應用,需求必須是明確的、確定的、邏輯上是一致的、完備的、正確的。有關結構化分析方法和面向對象的方法的培訓課教系統分析員如何來寫一個好的需求。如果在他們剛剛寫完第一個功能描述時就進行模糊性評審,那么他們寫的下一個功能就會更加清楚 (out of box) 。這種緊湊的反饋環 — 寫一個功能、評價一個功能 ---- 有效地加速了其學習曲線。這樣的話,過程從缺陷檢測到缺陷預防轉移的相當快速 ---- 他們正在寫著清晰、不模糊的規格。
Contrast this to the experience of the software industry as a whole. The structured techniques and the object oriented techniques have been available for over twenty-five years (yes, O-O is that old). Yet the state of the practice is far behind the state of the art. The issue is an organization does not fully accept nor understand a solution (e.g., the software engineering tools and techniques) unless it understands the problem being solved. Integrated evaluation and test is the key to problem comprehension. “Integrated evaluation and test” is defined here as integrating testing into every step in the software development process. It is thus the key to the necessary feedback loops required to master a technique. Any process without tight feedback loops is a fatally flawed process. Evaluation and test is then the key to accelerating the cultural change.
將這些經驗與我們的整個軟件工業做一個對比,結構化設計技術和面向對象的技術已經在 25 年前就可以應用了 ( 是的, OO 確實已經那么老了 ) ,然而我們的時間的情況卻遠遠落后于這些方法的最新技術發展水平。問題是除非組織理解了正在解決的問題,否則它不會全面接受或者全面理解一個解決方案(如:軟件工程方法和技術),而 集成的評價和測試正是問題理解的杠桿和關鍵 。這里“集成評價和測試”被定義為將測試集成到軟件過程的每一步中,它也是為掌握一個技術所需的必要的反饋環的關鍵部分。任何沒有緊密反饋環的過程是具有致命缺陷的過程,因此評價和測量是加速文化改變的關鍵。
3.2 The Role Of Evaluation And Test In Project Tracking
A project plan consists of tasks, dependencies, resources, schedules, budgets, and assumptions. Each task should result in a well defined deliverable. That deliverable needs to be verified that it is truly complete. If you do not evaluate/test the task deliverables for completeness you cannot accurately track the true status of the project.
一個項目計劃一般包含任務、關聯、資源、進度、預算和假設等等。每個任務都應當輸出一個良好定義的交付,且必須對交付產品進行驗證以證明該交付產品是確實是完備的。如果你對任務輸出的完備性進行評價和測試,那么你不可能準確地跟蹤項目的真實的狀態。
For example, Requirements Specifications always seem to be “done” on schedule. This is because many organizations do not formally evaluate the Requirements Specification. Later in the project they find themselves completing the definition of the requirements during design, coding, testing, and even production. What, therefore, did it really mean to say that the task of writing the requirements was completed?
例如
Incomplete “completed” tasks can also have a ripple effect on the completion status of subsequent tasks. In the above scenario, what is the impact of finding a requirements deficiency during code based testing? The “completed” Requirements Specification must be revised. The “completed” Design Specification must be revised. The “completed” code must be revised. The “completed” User Manuals must be revised. The “completed” Training Materials must be revised. The “completed” test cases must be revised.
The objective of project tracking is to give management and the project team a clear understanding of where the project stands. Without evaluation/testing integrated into every step in the project you can never be sure of what is and is not really completed. Given that Software Project Tracking and Oversight is a KPA and it depends on evaluation and test to perform the tracking, then evaluation and test as a KPA is a necessary preceding activity.
3.3 Evaluation and Test As A Percentage Of The Project Costs
A major pragmatic( 實際的;實用主義的,注重實效的 ) factor in determining what should and should not be a separate KPA is what portion of the software development budget and staff are involved in the activity. The more significant the activity is in these terms the more focus it should receive.
在決定一項實踐應不應當是獨立的 KPA 時,一個重要的實效因素是它在軟件開發活動的預算和人員投入中所占的比重。如果在這方面越顯著,那么它應當受到的關注應當越多。
There have been numerous studies documenting how project costs are allocated across the various activities. In these studies just the code based testing accounts for 35% to 50% of the project costs. This is true for both software development and for software maintenance. Factor in the effort to perform evaluations and this number is higher.
Organizations using any level of discipline in their testing have a tester to developer ratio of at least 1:3. More and more software vendors are moving to a 1:1 ratio. At times the NASA Space Shuttle project has had a ratio of 3:1 and even 5:1!
Simply put, any activity which consumes a third to a half of the budget and a fourth to a half of the resources should definitely be addressed by its own KPA.
3.4 Impact Of Evaluation and Test On Development Schedules And Project Costs
Numerous studies show that the majority of defects have their root cause in problems with the requirements definition. In one study quoted by James Martin, over 50% of all software defects are caused by incomplete, incorrect, inaccurate, and/or ambiguous requirements. Even more telling( 說法 ) is that over 80% of the costs of defects have their roots in requirements based errors.
Other studies show that the earlier you find a defect the cheaper it is to fix. A defect found in production can cost 2,000 times more than the same defect found in an evaluation of the requirements.
The issue is scrap( 廢棄的、零碎的 ) and rework. This is the primary cause of cost and schedule overruns on projects. The plan may have identified the initial set of tasks to be done. However, due to defects found later, “completed” tasks must now be redone. The “re-do” task was not in the original plan. As the number of tasks requiring rework grows, the cost and schedule overruns accumulate. Integrating evaluation and test throughout the project life cycle minimizes scrap and rework, bringing the costs and schedules back under control.
Integrated evaluation and test can further shorten schedules by allowing for more concurrent activities. When Requirements Specifications are not formally evaluated, the design and coding activities often result in numerous changes to the scope and definition of the functions being delivered. For this reason, work does not start on the User Manuals and Training Materials until code based testing is well underway. Until then no one is confident enough in the system definition.
集成評價和測試可以支持更多的活動并發從而進一步縮短進度。如果需求沒有得到正式評價,就常會導致設計和編碼活動產生對早期提交的功能范圍和定義的大量修改。正是基于這一原因,用戶手冊和培訓手冊等工作直到對代碼的測試都差不多了的時候才能開始。到那時,幾乎沒有人會對早期的系統定義有什么信任。
Similarly, poorly defined requirements do not provide sufficient information from which to design test scripts. The design and building of test cases often does not start until coding is well underway.
同樣,沒有很好定義的需求也不能提供足夠的信息來支持測試腳本的設計,因此測試用例的設計和構建也只能等到編碼做得差不多的時候才能開始。
These two scenarios force the development process to be linear: requirements, then design, then code, then test, then write manuals. If the Requirements Specification is written at a deterministic level of detail (i.e., given a set of inputs and an initial system state you should be able to determine the exact outputs and the new system state by following the rules in the specification), then test case design and the writing of the manuals can go on concurrently with the system design. This in turn shortens the elapsed time required to deliver the system. However, creating deterministic specifications requires formal evaluation of that specification.
根據這兩種場景,可以得出目前開發過程是線性的:先需求、然后是設計、編碼、接著是測試、編寫用戶手冊。如果寫的需求規格能夠達到一個確定級的詳程度 ( 即:給定一個輸入集和一個系統初始狀態,你應當能夠按照規格中的規則準確地確定輸出和新系統的狀態 ) ,那么測試用例的設計以及用戶手冊的編寫就可以和系統設計并行執行。這樣同時也就縮短了系統交付時間。但是,建立決定性的規格需要對規格的正式評價。
In summary, integrated evaluation and test reduces schedules and project costs by minimizing scrap and rework and allowing more activities to be performed concurrently. These types of gains can not be accomplished without integrated evaluation and test. Since time to market and cost to market are key issues for any software organization and testing is the key to achieving improvements in this area, then evaluation and test should be a KPA.
3.5 The Cost Of Defects
The cost of defects is rising at an exponential rate. This has two causes. The first is that our dependence on software is greater than ever. When it fails its impact is proportionate to that dependence. The second cause is litigation( 訴訟 ). There is a significant increase in the number of lawsuits concerning software quality. These are usually multi-million dollar exercises.
The support costs for software vendors is a growing concern. Microsoft receives almost 25,000 calls per day at an average cost per call somewhere between $50 to $100. This number is pre-Windows 95 which was expected to increase the volume by 4X. Sending out incremental bug fix releases also costs millions of dollars for some vendors. You also have to factor in the costs for developers to fix the defects and the opportunity loss caused by efforts going into fixing defects instead of creating new functionality.
Quality and the lack thereof also moves market share. Ashton-Tate went from being the industry leader in PC based data base software to being out of business due to large numbers of defects in one release of their main product. Market share for dBase went from 90%+ to less than 45%. Their acquisition by Borland did not stop the slide. Furthermore, only one year after their acquisition only 2% of all the people who had worked for Ashton-Tate still had jobs at Borland.
The direct costs of defects can be staggering for the end users of the software. Both United Airlines and American Airlines estimate that they lose $20,000 a minute in unrecoverable income when their reservation system goes down. A large manufacturer estimates they lose $50,000 a minute when their assembly line goes down. A large credit card company estimates they lose over a $160,000 a minute when their credit authorization system goes down. Million dollar defects are now common place. For example, if GM has a defect in the firmware that requires reloading the control program in an EPROM it could effect 2.5 million automobiles at an average cost to GM of $100 per car. There has even been an instance of a BILLION dollar loss due to a single defect. It was caused by a round off error.
Some estimates place the average cost of a severity one defect in production in the tens of thousands and even the hundreds of thousands on some applications. You can do a lot of evaluation and test for a $100,000. You could add an additional senior tester to the organization and, counting their salary and overhead costs, the break even point occurs when they find one or two defects that would have slipped through to production.
When you are dealing with safety critical systems how do you cost out the value of a human life? There have been hundreds and hundreds of deaths due to software defects. With software playing a bigger role in transportation and in the medical profession, the risk of deaths is rapidly increasing. [2]
The legal profession is beginning to take note of these costs. Many feel we should be held to the same standards as other engineering professions. This leads to the exposure of software product liability and professional malpractice. The financial exposure in such suits is enormous. To date the issue of setting legal precedents in this area is still in a state of flux. However, the trend is clear. Software professionals and their products will be held to the same standards of care and professionals as other engineers and their products.
Currently, most of the lawsuits related to software quality are being brought to court on the grounds of breach of contract. We (Bender & Associates) have been involved in a number of these as expert witnesses. We have never lost a case. This is because in each instance we have been on the side of the user of the software, not the producer.
Few software vendors can demonstrate that they have applied a reasonable level of due diligence in the evaluation and test of their software. The emphasis in most vendors is on dates and functionality, not quality. The result is that in half of the cases we have testified in the vendor has gone out of business as a direct result of the cost of litigation and the cost of the award to the customer.
If the CMM was addressing the medical profession, there is no doubt that the avoidance of malpractice suits would be a KPA. Well this issue is now on our doorsteps as software professionals. It requires a disciplined approach to evaluation and test to minimize this exposure.
如果 CMM 是在醫科領域,那么無疑避免玩忽職守將必然是一個 KPA ,然而同樣的問題也已經到了在我們軟件從業人員的家門口。
The net (要點) is that the direct and indirect cost of defects is already huge and rising dramatically. Defect detection and defect avoidance require fully integrated evaluation and test. This alone is sufficient to justify an evaluation and test KPA.
要點是 defect 產生的直接和非直接的費用已經相當巨大,而且還在戲劇性地增長。缺陷監測和缺陷規避需要全面集成的評價和測試。僅僅這一點足可以使評價和測試成為一個獨立的 KPA 。
4. THE PROPOSED SOFTWARE EVALUATION AND TEST KPA
Evaluation is the activity of ensuring the integrity of the various system specifications and models produced during the software development process. Testing is the machine based activity of executing tests against the code. The purpose of Software Evaluation and Test is to validate (i.e., is this what we want) and verify (i.e., is this correct) each of the software project deliverables, identifying any defects in those deliverable in a timely manner.
評價是對軟件開發過程中所產生的各種系統規格和模型的集成性進行保證的活動。測試則是基于機器的對代碼進行測試的活動。軟件評價和測試的目的是確認 ( 即:判斷這是我們所要的嗎? ) 和驗證 ( 即:這是不是正確? ) 每一個軟件項目交付產品,及時發現這些產品中的任何缺陷。
Software Evaluation and Test involves identifying the deliverables to be evaluated/tested; determining the types of evaluations/tests to be performed; defining the success criteria for each evaluation/test; designing, building, and executing the necessary evaluations/tests; verifying the evaluation/test results; verifying that the set of tests fully cover the defined evaluation/test criteria; creating and executing regression libraries to re-verify deliverables that have been modified; and logging, reporting, and tracking defects identified.
軟件評價和測試包括識別需進行評價和測試的交付產品;確定需要執行的評價和測試類型;定義每個評價和測試的成功準則;設計、構建、執行所需的評價和測試;驗證評價和測試結果;驗證測試集全面覆蓋既定的評價和測試準則;創建和執行回歸庫以用于在交付產品發生修改后進行重新驗證;登記、匯報、跟蹤發現的缺陷。
The initial deliverable to be evaluated is the software requirements. Subsequently, the majority of the evaluation and test is based on the validated software requirements.
最開始進行評價的交付產品是軟件需求,隨后,大部分評價和測試是基于被確認的軟件需求。
The software evaluation and test may be performed by the software engineering group and/or an independent test organization(s), plus the end user and/or their representatives.
軟件評價和測試可以由軟件工程組或者獨立的測試組織加上最終用戶和 / 或其代表。
4.1 Goals
________________________________________________________________________
Goal 1 Quantitative and qualitative evaluation/test criteria are established for each of the software project deliverables.
為每一個軟件項目交付產品,建立定性和定量評價 / 測試準則
Goal 2 Evaluations/tests are executed in a timely manner to verify that the success criteria has been met.
及時執行評價 / 測試,以驗證是否被滿足成功準則。
Goal 3 Evaluation/testing is sufficiently effective to minimize the impact of defects such as scrap and rework during development and operational disruptions after implementation.
評價 / 測試充分有效,從而確保開發中產生的廢品和返工等缺陷所產生的影響以及交付用戶運作后的操作破壞達到最小。
Goal 4 Defects and other variances identified are logged and tracked
through to their successful closure.
缺陷和發現的其他偏差被記錄、跟蹤直到成功關閉。
4.2 Commitment To Perform
________________________________________________________________________
Commitment 1 The project follows a written organizational policy for evaluating/testing the software project deliverables.
項目必須遵從一個成文的組織策略來進行軟件項目交付產品的評價和測試。
This policy typically specifies:
1. The organization identifies a standard set of software project deliverables to be evaluated/tested, the characteristics to be evaluated/tested, and the levels of verification criteria to be considered.
______________________________________________________
| |
Examples of deliverables to be evaluated and tested include:
- requirements specifications,
- design specifications,
- user manuals,
- training materials,
- data conversion specifications and support systems, and
- code.
|_____________________________________________________|
______________________________________________________
| |
Examples of characteristics to evaluate/test for are:
- functional integrity,
- performance, and
- usability.
|_____________________________________________________|
______________________________________________________
| |
Examples of levels of verification criteria are (using code based testing as the example):
- 100% of all statements and branch vectors;
- 100% of all predicate conditions;
- 100% of all first order simple set-use data flows; and
- 100% of all first order compound set-use data flows.
Examples of levels of verification criteria are (using requirements based testing as the example):
- 100% of all equivalence classes;
- 100% of all functional variations; and
- 100% of all functional variations, sensitized to guarantee the observability of defects.
|_____________________________________________________|
2. The organization has a standard set of methods and tools for use in evaluation/testing and defect tracking.
3. Each project identifies the deliverables to be evaluated/tested, the phase(s) in which they will be evaluated/tested, and how they will be evaluated/tested in each phase.
4. Evaluations and tests are performed by trained testers.
5. Evaluations and testing focuses on the software project deliverables and not on the producer.
Commitment 2 Senior Management supports and enforces that projects must meet
their pre-defined success criteria before installation into production in the users/customers environment.
高級管理支持并保證:在用戶 / 客戶的環境中安裝和生產之前,項目必須滿足其預先定義的成功準則
文章來源于領測軟件測試網 http://www.kjueaiud.com/