PDA

View Full Version : IT Guys, i need your help ASAP



t_jolt
Mon Jul 20th, 2009, 09:30 AM
So one of our core ibm servers is starting to report lots of hardware errors. I was wondering if any of you knew how to run hardware diagnostics on an IBM pSeries 9111-520, os AIX 5L, level 5.3.0.0

Thanks
Tyrel

Graybird
Mon Jul 20th, 2009, 09:35 AM
OMG, you too... that sounds a lot worse then a router problem

fook
Mon Jul 20th, 2009, 10:05 AM
what does errpt show for starters?

errpt|more - > do you see any hw/sysplanar0/disk/adapter failures?

if so do an errpt -aj <identifier> where identifier is the first column alphanumeric sequence and dump one of those entries here.

from there depending on what the error is you can do into diag and get a better idea sometimes... with the "diag" command.

do you have a service contract with ibm/vendor?

t_jolt
Mon Jul 20th, 2009, 10:11 AM
nada, just found out this morning we dont have one. As the last IT director Cancelled contracts when he left! ( wtf???) so trying to figure a bunch of stuff out at once.

ill post once i run the commands.

any idea for support vendors?

Thanks
Tyrel

t_jolt
Mon Jul 20th, 2009, 10:15 AM
LABEL: DISK_ERR4

IDENTIFIER: 49A83216



Date/Time: Sun Jul 19 22:18:01 MDT 2009

Sequence Number: 8550

Machine Id: 00CA486D4C00

Node Id: idxhost

Class: H

Type: TEMP

Resource Name: hdisk0

Resource Class: disk

Resource Type: scsd

Location: U787A.001.DNZ08D7-P1-C1-T1-L5-L0

VPD:

Manufacturer................IBM

Machine Type and Model......ST336753LC

FRU Number..................00P2693

ROS Level and ID............43353141

Serial Number...............0001C110

EC Level....................H12094

Part Number.................00P2692

Device Specific.(Z0)........000003129F00013E

Device Specific.(Z1)........0626C51A

Device Specific.(Z2)........0002

Device Specific.(Z3)........05233

Device Specific.(Z4)........0001

Device Specific.(Z5)........22

Device Specific.(Z6)........H12094



Description

DISK OPERATION ERROR



Probable Causes

MEDIA

DASD DEVICE

t_jolt
Mon Jul 20th, 2009, 10:16 AM
WRT the hard drive, i say that sounds like disk failure. this presents a problem:

hdisk0 and hdisk 1 are roughly 34 Gigs each and are the only drives in the roughly 69 Gig rootvg volume group.
basically the entire operating system is stored in rootvg, and it does not appear to me to be redundant.
I believe every hard drive bay in idxhost is taken, meaning we can not add any more drives.

assuming i am correct, this is a serious problem.

TurboGizzmo
Mon Jul 20th, 2009, 10:35 AM
WRT the hard drive, i say that sounds like disk failure. this presents a problem:

hdisk0 and hdisk 1 are roughly 34 Gigs each and are the only drives in the roughly 69 Gig rootvg volume group.
basically the entire operating system is stored in rootvg, and it does not appear to me to be redundant.
I believe every hard drive bay in idxhost is taken, meaning we can not add any more drives.

assuming i am correct, this is a serious problem.



What is this server running? (as in can it be taken offline for any amount of time?) hd0 and hd1 not mirrored or anything?

fook
Mon Jul 20th, 2009, 10:38 AM
yeah thats a temp disk error which is certainly better than PERM but its indicative(usually) of a coming failure at some point in the future if they are occuring a lot.. could also be a firmware issue though if you recently changed anything/oslevel etc.

two disks in rootvg? rootvg not mirrored?
lsvg -p rootvg to show the number of phys volumes in rootvg
lsvg -l rootvg and look to see how many PV's each LV has.. a mirrored rootvg should have 2 for almost everything except dump devices/sometimes paging.

how full is rootvg? maybe you can move LV's all onto hdisk0 and then mirror to hdisk1 so you dont encounter a failure if/when a disk drops out.

JustSomeDude
Mon Jul 20th, 2009, 10:46 AM
C:diagnostics

works everytime! :drink:

fook
Mon Jul 20th, 2009, 10:50 AM
couple things,

1: make sure you guys have a current mksysb image backup of this box should you lose a disk sometime soon... im hoping you have a NIM server? if not a tape drive to take the mksysb to?

2: not sure of your aix skillset but you can check the space of each drive easily with lspv <hdiskX> ... if one disk has enough free space to hold the contents of the other disk then you can shrink everything off one and mirror to another while the system is live.. although considering you're dealing with a potentially dying disk there is risk.

t_jolt
Mon Jul 20th, 2009, 10:51 AM
Not mirrored at all.
server is running our entire billing and scheduling system (60+ plus doctor practice)

We are only using about 20 gigs, so we can move it all to one drive, but the our only linux guy does not know how to do this... So we need to bring someone in, unless someone is willing to give me steps :) and i will repay them with liquor.

We have a tape drive with a current mksysb image

fook
Mon Jul 20th, 2009, 10:54 AM
i can send you detailed instructions on getting that thing mirrored.. pm an email address.

TurboGizzmo
Mon Jul 20th, 2009, 10:56 AM
Not mirrored at all.
server is running our entire billing and scheduling system (60+ plus doctor practice)

We are only using about 20 gigs, so we can move it all to one drive, but the our only linux guy does not know how to do this... So we need to bring someone in, unless someone is willing to give me steps :) and i will repay them with liquor.

We have a tape drive with a current mksysb image

Sounds like sleepover time......would be risky to play too much durning the middle of the day...

I was poking around this forum https://www.ibm.com/developerworks/forums/thread.jspa?messageID=13821657&#13821657

but looks like fook is your current million dollar saving grace :)

t_jolt
Mon Jul 20th, 2009, 11:00 AM
i can send you detailed instructions on getting that thing mirrored.. pm an email address.


pm sent