Cluster Failures

No, “cluster failures” is not a euphemism for a problem encountered in the military service and widely known to be caused by officers with clusters on their shoulders. Nor does this page relate to failures of computing clusters.

This is about failures of independent computer systems and other devices that occur very close together in time. In some cases, the failures may also be of similar type such as a series hard drive or motherboard failures.

And recently I really have had a cluster of problems, some with computers and some not. Actually a cluster of clusters.

I have been repairing things for about sixty years. I started with TV’s, ours and our friends’ and neighbors’. Then I worked at a couple now defunct audio stores. I went on to fixing unit record equipment and computers for IBM. And when personal computers started hitting the streets, I started fixing those as well, both hardware and software.

Cluster failures are a real thing. When working at the audio stores, there would be days or weeks when the most common things people brought in for repair were receivers. And they were frequently similar failures such as the power output stage, or the IF stage. Other days it would be turntables with broken belts or component tuners that would not tune.

When I worked for IBM, there was one week when I fixed three punched card readers with nearly identical feed failures. And later, when working at the IBM PC National Support Center, there were days when most of the calls I fielded were memory (RAM) problems.

Sure, randomness abounds in the universe, including the failures of electronic and electromechanical devices. But sometimes that randomness expresses itself in clusters of failures.

The Church Cluster

It all began on Wednesday, September 9, when the server at my church began failing; really crashing hard. The computer we use as a firewall was also failing with intermittent crashes. In addition, one of the office computers froze up and the telephone system began failing.

It took a couple days for this to this play out and repair the computers. I don’t do the phone system, someone else does that.

The server, a donated Dell, was clearly having hard drive problems. There were specific errors on the console pointing to one of the hard drives. Murphy rules and there were no errors recorded in the logs, because the failing hard drive was where the logs were kept. So because I was not on site and the errors were displayed on the console, the person who rebooted for me could not read the errors as the display had timed out into power saving mode and pressing keys on the keyboard did not wake it.

In addition, the computer we use for a firewall was locked up so I rebooted it.

I took the server home to rebuild and by the next day had mostly completed that task, but not without discovering serious hardware issues. One of the hard drives had failed catastrophically. That was easy enough to fix. But in attempting to install a new operating system on the replacement hard drive, it became obvious that there were other problems as well. The motherboard was failing as well and it was impossible to boot or even to get through the BIOS POST. So I installed the spare Dell motherboard we kept on hand for just this event, and was able to proceed.

I was able to restore the data for our web site and email servers from the good and well-tested backups I designed. I was also able to restore the data for  the DHCP and name service (BIND) servers.

However, soon after I returned home with the server that I needed to rebuild, a different computer, the firewall began locking up more frequently. The office staff were still able to get out to the Internet so long as that firewall was working, but our web site and email was down because that is all housed on our server. Without the firewall, all external access was gone.

The next day, Thursday, I returned to the church and installed the rebuilt server, which worked fine.

However, after installing the server, the firewall started failing so frequently that I could not leave the premises before it would do so again. I made a quick trip back home to obtain a spare computer that had been given to me and had been used as a firewall itself. I installed it at the church, made a couple simple configuration changes, and the replacement firewall was up and running.

The Home Cluster

While all of that was going on at the church, my home network was also embroiled in a cluster of problems.

First, my own server started failing. One of the four 1GB memory DIMMs had failed. One of my workstations had a motherboard failure, and another system developed a defective power supply. A fan then failed on my server, and a video adapter failed on a different workstation. And, oh yes, a hard drive failed on my own workstation. And then a hard drive failed on my server.

And don’t get me started on my refrigerator and car.

What’s it all about, Alfie?

All of this took place within the space of a week, both at church and home. So it was a very trying week.

But what does it mean?

Well, as much as we like to assign meaning to things, there really is none. Things fail. Most of the time they work for years without a problem. Sometimes the failures are spread out evenly over time, or suddenly many things seem to fail at once.

So, sometimes when you get something fixed one day and it fails with another problem the next, that is just the randomness of the universe in which we live.

And now, weeks after the events described, all is well with the computers at church, at home, with my computers, car and fridge — until the next time.

Disk errors – server migration in progress

Due to an accumulation of hard drive errors on this web and email server I am preparing a new server to take over from this one.

Over the past couple weeks, the SMART function of the hard drive installed on the server has been reporting steadily larger numbers of permanently unreadable disk sectors. So far this has caused a couple minor software crashes that did not take down the whole server, just one or two of the running server functions.

So in the interest of a smooth transition, I have started work on a new server by installing CentOS 6 on a computer I had doing some minor functions that were fun but not necessary. I will be migrating server functions from the old server to the new over the next couple weeks. I do expect there to be some very short periods of down time, but none should last more than a few minutes.

If you encounter difficulty with accessing my web sites or sending me email. please be patient and try again in a few minutes.

Thanks for your patience.

A Quick Look at Fedora 23

This is only the second day of the general availability of Fedora 23, so this is not a full-on review. It is just a quick look at what I have experienced so far.

I was so excited to try Fedora 23 that I broke my own rules and installed it directly on my primary workstation without so much as a test in a VM.

Upgrade

Yes, I was able to upgrade from Fedora 21 to Fedora 23. I have not had a successful upgrade in years and have had to resort to complete reinstallations — while saving home directory data, of course. The old fedup program never worked for me.

The network-based dnf system upgrade procedures worked very well with only one minor glitch.

I installed the dnf upgrade plugin, used the dnf system-upgrade download command to download the packages required to perform the upgrade on my system, and then rebooted to perform the upgrade.

The only problem I had was that the download procedure did not download or install the Fedora 23 public signing key. I installed that manually and the rest of the procedure worked just fine.

All in all, it took a little under 3 hours to perform the upgrade, in large part because I have a lot of things installed for testing purposes.

Plasma 5

In my look at Fedora 22, I was very critical of the state of the Plasma 5 desktop environment because it was far from complete and there were many issues that prevented me from doing the daily work that I required.

Plasma 5 in Fedora 23 is far more complete and well polished. It still has a few minor rough edges, but everything works as I expect it to. I still do not care much for the default Breeze icon set, but at least now I can change to a different set using the System Settings. Despite that, I am using the default set so I can spend enough time to give them a fair test.

Too Early For Conclusions

It is way too early for any final conclusions about Fedora 23. However my brief experience so far leads me to predict that this will be an excellent release of this staple desktop OS. And so far I have only installed the desktop version and not the server version.

I will try to post a more complete review in the News and Reviews section of this web site when I have more experience with it and some time available to do so.

SystemV startup vs systemd: My presentation at All Things Open

I will be presenting the talk, SystemV startup vs systemd at All Things Open on Monday, October 19th at 3:25pm in room 305B

systemd is a controversial replacement for the init daemon and SystemV start scripts that is now used by many important distributions. My presentation will cover some of the differences between these two startup systems as well as some basic usage information needed by anyone getting started with systemd.

I hope to see you there.

My “All Things Open” Talk

I will be presenting the talk, SystemV startup vs systemd at All Things Open on Monday, October 19th at 3:25pm. I do not yet know which room I will be in, but that should be available on the schedule when you get to the conference.

systemd is a controversial replacement for the init daemon and SystemV start scripts that is now used by many important distributions. My presentation will cover some of the differences between these two startup systems as well as some basic usage information needed by anyone getting started with systemd.

I hope to see you there.

David Both to present at All Things Open

I will be presenting at least one talk at All Things Open this October 19th and 20th.

The one talk that has been accepted so far isSystemV startup vs systemd”. systemd is a controversial replacement for the init daemon and SystemV start scripts that is now used by many important distributions. My presentation will cover some of the differences between these two startup systems as well as some basic usage information needed by anyone getting started with systemd.

I will post more details about the specific date and time when I am notified of that information.

I hope to see you there.

KDE Plasma 5 Disappoints in Fedora 22

Although Fedora is still my distro of choice, KDE Plasma 5 (KP5) is a real disappointment and makes Fedora 22 unusable for me. It is reminiscent of the switch from KDE 3.5 to KDE 4, where many things did not work and others were simply missing.

Understand that I am a KDE fanboy; it is my favorite desktop. But KP5 is unusable for me. Even after spending days trying to make it work to meet my needs I was unable to feel even marginally comfortable with it.

Many of the widgets that were present in KDE Plasma 4 are now missing, including a few that I really find useful such as the Konqueror profiles which enable me to use four default profiles for Konqueror and to create my own. In fact, Konqueror seems to ignore profiles now, even when I try to launch them from the command line. Perhaps the profile location has moved and the KDE 4 location no longer works; but that begs the question of why make that change. Konqueror is my favorite file manager and being unable to use my own profiles with it is nearly a deal breaker all by itself.

The multimedia configuration page in System Settings was unable to detect any of the multiple soundcards I have installed in my workstation. This failure, along with the inability to deal with my preconfigured Konqueror profiles, makes it impossible for me to work effectively in KP5.

The KP5 desktop itself is usable but flat, boring and uninspiring. Perhaps simplicity and clean looks is the watchword for this release but I don’t like it.

The real problem with this are the issues I had when trying to make the desktop look good for me. There are only two options for the default desktop look and no way to download more. There are also few icon options and again, no available downloads. KP5 does not recognize my existing wallpapers and was forcing me to import each individually.  Making changes to the desktop such as pointer schemes and various modifications to application appearance cause the desktop to crash repeatedly.

Underneath there are no major changes to Fedora 22 itself. The major changes like systemd, the new anaconda installer, and firewalld are well past. But the new Anaconda installer still sucks!

I did try to use various forms of GNOME, including Cinnamon and MATE, but I find those desktops too restrictive for me. So I went back to Fedora 21 with KDE Plasma 4 and I am now happy again. I will wait until KP5 is fixed before I upgrade to a newer version of Fedora – just as I did when KP4 made its appearance.

Ironically, this decision to revert to Fedora 21 is a reflection of my own somewhat inflexible approach to my desire for a flexible desktop experience. I like KDE Plasma 4 and the extreme flexibility it gives me. I find myself hampered and seriously annoyed by the lack of the features and flexibility in KP5 that I have grown used to in KP4 and I am unwilling to deal with those shortcomings for any length of time.

My main question is why would one release a desktop so seriously full of holes and annoyances?

Millennium Technology Consulting LLC Dissolved

For a number of reasons, I am closing down the business entity known as Millennium Technology Consulting LLC effective immediately.

I will continue to maintain this DataBook® web site, where I post technical information for Linux system administrators and end users. If you are looking for help with Linux and other Free Open Source Software (FOSS), I post information here that – for me at least – was difficult to find or that took me a lot of time to discover through experimentation.

Because that business subsidised the operation of this web site, that source of financial support is no longer available. So, if you find this web site useful, I ask you to consider supporting it by donating so that it may continue to exist.

Thank you.

Maintenance outages today, January 08, 2014

I will be performing some emergency maintenance today, to replace a couple old and failing UPS units. The batteries are OK, but the units themselves are failing after several years.

There will be a few short outages of the email and web sites during this maintenance.

Thanks for your patience.

David Both

It helps to know how things work

It really helps to know how things work when it becomes necessary to fix them.

This was true when I was fixing audio equipment in the early ’70s, and supporting computers and software for IBM, MCI, Interpath, and Cisco over the years, and teaching Linux for Red Hat and my own company, Millennium Technology Consulting LLC. The intimate knowledge of how Linux works has also been invaluable since I started working with it in about 1996.

Unless you know how things really work, there is a tendency to use a shotgun approach to problem solving. That wastes time and, if replacing parts is involved or purchasing new software, can be quite expensive.

After all, would you be willing to pay for the auto mechanic to replace several perfectly good parts while trying to find the one part actually causing the problem – and to pay him for time and materials as well? Of course not. Although that used to be the case more often than it should have been.

I submit for your approval a problem I just fixed this morning – with this web site.

It was not a problem that affected the external operation of the DataBook web site, but I could no longer use any editor from within WordPress to edit pages and posts such as this one.

Because I know several important things about WordPress I was able to think about the problem and correct it on the first try. I know the following about WordPress:

  • The data for WordPress web sites is stored separately in a MySQL database. Separation of data and code is always a good thing to do.
  • There is one and only one, small site configuration file for each WordPress web site, wp-config.php.
  • All WordPress plugins, themes, and uploaded graphics also have their own directories.
  • The Apache web configuration is separate from the WordPress site configuration.

So it was a simple matter to simply delete the entire directory in which the WordPress instance was installed for that web site. Everything.

I then copied the entire directory structure from a known working web site to replace the one I deleted. I then copied the original wp-config.php to the appropriate location in the newly copied WordPress directory structure and my web site was up and running again. It was then trivial to copy from backups the rest of the plugins and graphics to complete the process. All in all it took less than 5 minutes.

Not having the understanding I do of how WordPress, MySQL and Apache work together to produce a web site, I would have been tempted to simply delete everything in the WordPress directory (/var/www) for that web site and start over by reinstalling WordPress and configuring it from scratch. As easy as that is for WordPress, it would still have taken much longer than it did for me to actually fix the problem.

If I had understood more about the PHP coding of WordPress itself, I probably could have simply repaired the offending file that was likely corrupted for some reason. But that would probably taken much longer in any event.

If you are interested in learning how Linux works so that you can identify, understand and fix problems in the most effective ways, try the Linux classes I offer at Millennium Technology Consulting LLC.

CentOS 7.0 released

CentOS 7 was released today, July 7.

CentOS is identical to Red Hat Enterprise Linux (RHEL) with the only exception being the branding text and graphics. CentOS is a fully supported Community ENTerprise Operating System that provides free upgrades and support.

CentOS 7 incorporates several major changes and enhancements. These includes things like systemd and GNOME 3. In addition, the XFS file system is now the default.

Many of the new features in CentOS, such as systemd, have been around for a couple years, most notably in the Fedora distribution. Fedora is the upstream feeder to RHEL and many new RHEL features are first introduced in Fedora.

See http://www.centos.org/ for more details anout CentOS 7.