Gaga Rigs

Linux and QA: The Era of Playing Around is Over

I boot the PC from a linux live cd. It gives a kernel panic. I try another live cd, same thing, I try Acronis live cd- you guessed it- KP. This is a common HP desktop made in 2005. It has the media drive thingie.  The fact that the famous “thousands of eyes” across the world who use and test linux couldn’t track this  glaring bug and fix it five years is rather ironic.

Mind you, this is a kernel level issue. This is not a x.org bug, a nautilus bug, alsa bug or a bug in the opensuse package manager that I found within five minutes of using it. Moreover this is not a bug caused by distros modifying the vanilla kernel and introducing bugs. So we have the picture. The kernel has a bug that renders it to crash (very early in the boot process) on a commodity, off the shelf desktop used by millions (Okay maybe not millions, it’s a common desktop model, though)

Why is this so? Well the answer is twofold. Firstly, the final responsibility of the product is with the OEM that is the vendor of the product. Apple cannot say “You know BSD or Broadcom update broke my library, sorry go ask the upstream to fix it. The maker of the OS is the responsible party to the consumer. Where and how they manage their sources is irrelevant. Asus is responsible for their motherboards, and if a capacitor manufacturer has a defect in their product Asus can’t play ping pong with the user of the motherboard.

With commodity computers that you buy from a store the maker is the one who stands behind the product, including the OS and the apps. So, if you have a problem with Windows, you don’t call Microsoft, you call the OEM. If you buy a retail version of the OS and install it yourself, then MS is the one you contact. With Linux, you can get optional support that vendors like Red Hat, Oracle and Canonical provide. Other then that the distro maker gives no explicit warranty for their product. When something happens they point fingers to the upstream. Fine, but since almost all distros mod upstream packages, they ruin the entire advantage of open source in that everyone can use off the shelf software instead of reinventing the wheel. When distros make changes, they throw the responsibility on their shoulders because the upstream can’t look into every 300 distro which have numerous versions of their own and track their bugs and regressions.

In fact, I don’t understand what software warranty can mean in linuxland. Since Red Hat and everyone else just packages stuff from upstream, the best they can do is patch it on the spot or wait for the upstream to patch it, they don’t have control over the software, since they don’t make it.  The baker doesn’t make the cookie cutters. If the shapes have a flaw, he’ll essentially have to stop making the cookies and give people their money back. If I buy RHEL and pay for the support and my common HP desktop gets a kernel panic after three seconds of loading it, what the hell are they going to do? If they issued a patch that patch would have already been integrated into the mainline years ago.  The fact that I was getting this in 2010 means that they didn’t. So that means anyone who tried to use linux on that desktop just gave up.

 

Two biggest problems and how to address them

Lack of standardization and  lack of testing

 

The reason that Windows has such great hardware and software support is not just because Windows driver support is virtually mandatory for every piece of hardware. It’s also because it follows standards that make life easier. Azalia is a standard, so is AHCI. You essentially write a generic alsa driver and call it good. The one shipped with Windows Vista and up works 100% of the time.  With linux, not so. The mess that is ALSA, pulseaudio, and gstreamer causes endless problems that would vanish simply by following standards.  The same goes for generic video support. I can put XP on my mac mini, a ten year difference between the software and the hardware, and the video would load without a hitch. It won’t be the full driver, but I’ll at least get a GUI. Linux drivers use KMS by default, and that is broken on many GPUs (mine included). Linux development model is like a C student bringing the wrong book in an open book test.

The fact that Linux gets less developer attention is unfortunate. That means that hardware support is not going to be as good because the hardware vendors are not going to put as much effort as with Windows. The same goes for software vendors. But what makes things much worse is the lack of standards that needlessly create bugs. The same exact intel gpu driver works on one distro but bombs in the other. So the solution is to stay as close to the main upstream as possible and make it so that the patches are only extensions and don’t actually “change” the package itself. In other words, nautilus has bugs, don’t make new bugs by your stupid customizations.

One of the few problems I had on my desktop at work was boot failure due to ubuntu changing the drive naming and hence messing up the mount points in fstab. This is retarted. It is known that kernel updates often change the naming scheme for devices. The comments section in fstab even tells you that its better to use UUID. And yet the default install just merrily uses drive name mountpoints without warning the user. Clearly the designers of Ubuntu don’t understand software safety.

On the fly fallback is another thing. There are known bugs that effect common hardware (I’m not talking about some weird audio codec used by pigmies during animal sacrifices.  95% of computers use common hardware. If the bug is known, the kernel should auto-detect the hardware and work around it. KMS does not work for 320m nvidia. It loads scrambled garbage. Why the heck can’t the kernel detect that I have that chip and just disable KMS without me typing nomodeset. Windows proves that you can load an OS without driver support. Get it done. Lack of driver support is not an excuse for a X.org crash when starting in a virtual machine. Cmon guys, stop goofing around.

 

Lack of Testing

 

The level of lack of testing for Linux is truly breathtaking. The fact that new versions of Ubuntu break sound or X.org (can’t use the “we don’t have the drivers” excuse anymore) is notorious. What’s crazier is that the hardware breakages are on hardware that is used by millions on people. Either the distro makers don’t fix the reported breakages or the breakages are not reported. What’s obvious is that these guys don’t test their stuff before shipping. I’m talking about out of the box breakages, not “when the planets align” breakages. About a year or two ago Ubuntu shipped a release where the sound stopped working on one third of all netbook models. I’m also guessing these netbooks had the aforementioned Azalia sound chips. This is moronic.

The lack of testing of the software is also endemic. Not only are there constant hardware breakages but the pure software components also suffer from out of box bugs. When you load Opensuse 12.1 and download a package from the built in browser, the package manager gets called and then says not found. Really, that took me five minutes of testing. I had to install the package by hand from the terminal. I tested Debian today to see if the long release cycle and testing that they do pays off. Nope. Nautilus has a bug where your network mount points are invisible to other programs. This is a Gnome bugs from years ago, Debian was released in February 2011. When you want to mount from the command line it gives an error about block devices. You need to install smbfs. If you can’t handle virtual folders don’t effing do it. At least give an option to mount in a real location. Again, this is not an obscure function. It’s a common thing that can be found by only light testing.

So, here is a simple solution. Don’t ship until all the critical bugs are fixed. Start from small. Test your release on all the popular mac models. If you get an out of box breakage go and fix it. Basically, test your software on the most commonly used computer models and then work to the less common. You can’t have no sound on the most popular sound chips.

There should also be out the box testing for pure software components. If I can find a critical bug in ten minutes, then you  have failed at basic testing. Linus advocates should be honest and admit that Linus is neither reliable nor stable. The hardware support is buggy and the software components are also buggy. When I say buggy I don’t mean that the compiz rendering is jerky, I mean a piece of software has a glaring broken function that was missed by testers. The stupid claim that Linux works on servers for years without crashing is deceptive because servers are baby sat by ITs who patch bugs and run repetitive workloads on one set of hardware. They are running apps headless, so no GUI framework or toolkit bugs. Servers are just dumb behemoths, they are highly optimized to churn data. Making a better server OS is not going to fix the issue of closing a window crashing x.org and losing all your apps or your wifi signal being dropped or even a usb driver bug causing a kernel panic. I was watching an Oracle dev give a talk that since Oracle run thousands of machines on diverse workloads, bugs cannot slip through easily. They are optimizing XEN or btfs or whatever. The fact that grub, the effing bootloader that the entire free unix ecosystem depends on, has bugs because of lack of manpower doesn’t concern them. Today I tried CentOS, the “stable” linux distro. It loads a broken framebuffer. The fact that an nvidia chip used by millions of people doesn’t work with their stable release doesn’t make them feel guilty. It blows my mind that any developer would waste even a second of their valuable time on snow effects in compiz when grub or samba has serious problems.

 

Play to your strength

 

Commercial software needs to maintain many branches of their software in part because people are not always willing to pay for the latest software. Excluding the cases where the user simply doesn’t want to migrate to the new branch and wants to either stall migration until it becomes unfeasible or is a grandma type who is still using win98 because the utter unwillingness to relearn a few things.  In other words, many people don’t want to be early adapters and want to wait a little for the software to mature. So you need to maintain multiple branches for at least some period of time.

Many stand alone apps don’t have this. The creator of VLC is not going to waste time backporting features and releasing bug fixes for older versions when the user can just update to the newest one. MS Office, on the other hand, has a pretty long support cycle for multiple branches (2003, 2007, 2007, 2010). Apple is actually pretty breakneck in terms of dropping support. It would come as a shock to many that 10.5, released only in late 2007, receives no new updates. But the fact remains that no sane person actually wants to use Vista instead of just using 7, (actually Vista sucks balls quite hard compared to 7) the issue is not upgrade hurdle, its that people don’t want to shell out two hundred dollars.

With free software I don’t see the point in keeping long support for old software. It requires a lot of man hours. This is especially true for point releases. If I’m using KDE4 I would want the latest one, say, 4.8, over a patched version of KDE 4.3. Distros should make it that certain components are set to a rolling release system instead of locking the user to that branch. Since backwards and forwards compatibility is crappy on the linux desktop anyway, it should be made so that people can easily upgrade from point releases. One can argue that they want to use Mate instead of gnome 3, but who the hell says that they want to use 2.6.2x instead of just getting the latest kernel? I don’t. The other option is to make it so that one can upgrade easily from distro release to distro release. But that takes a huge amount of effort to actually pull off and hithero the only company that has pulled this off is Apple. Right now the software that is released with each distro is locked in to that version. So you are forced to upgrade your distro to get the latest firefox or libreoffice. The bug I was talking about in gnome 2 that debian uses is fixed in gnome 3. Gnome 2 does not receive any new updates, so it would behoove developers to enable users to upgrade more easily. Kernel updates are a no brainer. Almost everyone wants the latest stable kernel, they also want the latest app stack.  As of now this involves resorting to upgrading twice a year.

 

Who butters your bread

 

If you look at the linux kernel and the surrounding open source technologies the features list is jawdropping. IBM, Google and Oracle pour millions to make virtualization and scalability stellar for their enterprise machines. So in essence, Linux has amazing enterprise features because people with deep pockets pay full time devs to make these things happen. Desktop support sucks balls because no company has incentive to make their hardware or software support better for linux. It’s actually amazing that vendors spend money on writing Linux drivers at all. Linux only has a 1% marketshare, so who cares if my wireless driver is going to blow chunks? Linux is very lucky that things like sound, ide, sata, usb and firewire use generic standards. If every chipset needed a custom usb driver Linux driver support would have been abysmal.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>