Hard disk difettoso?

marcomotta · 10 Maggio 2020, 11:57am

Ho appena comprato un nuovo hard disk (arrivato ieri), ho fatto un test completo tramite SMART (sia il test breve, che quello lungo non mi sembra che abbiano rilevato problemi, o perlomeno lo stato del disco risulta ok), e ora ci sto copiando i dati dal vecchio hard disk.
La cosa che mi insospettisce è che, mentre copio (apparentemente senza problemi) i files, il comando

$ dmesg -w

evidenzia diversi errori del tipo

res 40/00:68:00:e8:73/00:00:74:00:00/40 Emask 0x10 (ATA bus error) 2796.761179] ata1.00: status: { DRDY } 2796.761180] ata1.00: failed command: WRITE FPDMA QUEUED 2796.761184] ata1.00: cmd 61/00:58:00:e0:73/04:00:74:00:00/40 tag 11 ncq 52428
o

res 40/00:48:00:34:33/00:00:96:00:00/40 Emask 0x10 (ATA bus error) 4920.902369] ata1.00: status: { DRDY } 4920.902372] ata1: hard resetting link 4921.363030] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) 4921.366734] ata1.00: configured for UDMA/133 4921.366878] ata1: EH complete
Non è una cosa continua, ma ne ho rilevati diversi.
Devo pensare che l’hard disk sia difettoso?

Riporto l’output di smartctl:

[code]# smartctl -a /dev/sda
smartctl 6.2 2014-07-16 r3952 [x86_64-linux-3.17.2-200.fc20.x86_64] (local build)
Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Desktop HDD.15
Device Model: ST4000DM000-1F2168
Serial Number: Z3029HVF
LU WWN Device Id: 5 000c50 0792e018c
Firmware Version: CC52
User Capacity: 4,000,787,030,016 bytes [4,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5900 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Thu Nov 20 12:31:57 2014 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 107) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 529) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x1085) SCT Status supported.

SMART Attributes Vendor Specific ID# ATTRIBUTE_NAME 1 Raw_Read_Error_Rate 3 Spin_Up_Time 4 Start_Stop_Count 5 Reallocated_Sector_Ct 7 Seek_Error_Rate 9 Power_On_Hours 10 Spin_Retry_Count 12 Power_Cycle_Count 183 Runtime_Bad_Block 184 End-to-End_Error 187 Reported_Uncorrect 188 Command_Timeout 189 High_Fly_Writes 190 Airflow_Temperature_Cel 191 G-Sense_Error_Rate 192 Power-Off_Retract_Count 193 Load_Cycle_Count 194 Temperature_Celsius 197 Current_Pending_Sector 198 Offline_Uncorrectable 199 UDMA_CRC_Error_Count 240 Head_Flying_Hours 241 Total_LBAs_Written 242 Total_LBAs_Read Data Structure revision number: 10
SMART Attributes with Thresholds:
FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
0x000f 103 099 006 Pre-fail Always - 5960632
0x0003 096 096 000 Pre-fail Always - 0
0x0032 100 100 020 Old_age Always - 4
0x0033 100 100 010 Pre-fail Always - 0
0x000f 100 253 030 Pre-fail Always - 78688
0x0032 100 100 000 Old_age Always - 15
0x0013 100 100 097 Pre-fail Always - 0
0x0032 100 100 020 Old_age Always - 4
0x0032 099 099 000 Old_age Always - 1
0x0032 100 100 099 Old_age Always - 0
0x0032 100 100 000 Old_age Always - 0
0x0032 100 100 000 Old_age Always - 0 0 0
0x003a 100 100 000 Old_age Always - 0
0x0022 070 068 045 Old_age Always - 30 (Min/Max 26/31)
0x0032 100 100 000 Old_age Always - 0
0x0032 100 100 000 Old_age Always - 0
0x0032 100 100 000 Old_age Always - 32
0x0022 030 040 000 Old_age Always - 30 (0 21 0 0 0)
0x0012 100 100 000 Old_age Always - 0
0x0010 100 100 000 Old_age Offline - 0
0x003e 200 200 000 Old_age Always - 14
0x0000 100 253 000 Old_age Offline - 13h+29m+41.227s
0x0000 100 253 000 Old_age Offline - 2369657232
0x0000 100 253 000 Old_age Offline - 420445

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Extended offline Completed without error 00% 9 -

2 Extended offline Interrupted (host reset) 00% 0 -

3 Short offline Completed without error 00% 0 -

4 Extended offline Interrupted (host reset) 00% 0 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.[/code]

Moreno · 10 Maggio 2020, 11:57am

Ciao

Strano che SMART non riporti errori, ho trovato qualche cosa https://bugzilla.redhat.com/show_bug.cgi?id=917826 al riguardo.

Prova a fare i test approfonditi di SMART magari salta fuori qualche cosa.

Io recentemente ho avuto dei seri problemi con un Kernel che perdeva per strada uno dei miei due dischi SATAIII, però era un kernel RC, ora non lo fa più.

Ciao Ciao, Moreno

marcomotta · 10 Maggio 2020, 11:57am

In realtà qualcosa riporta:

# smartctl -a /dev/sda | grep CRC 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 19
(poco fa era 14!)

N.B. Il test approfondito l’ho fatto ieri.

marcomotta · 10 Maggio 2020, 11:57am

Ho trovato http://lime-technology.com/forum/index.php?topic=16186.0:

[quote]That error is usually a bad sata cable or a sata cable wrapped around a power wire.
change that out and see if it keeps counting up.

the other option is to swap connections with a “good drive” and see if the errors change drives or follow the drive.
if it follows the drive, i’d think it is bad internal connector on the drive.[/quote]

Potrebbe dipendere dal cavo? Lo spero, perché ho aspettato più di un mese perché mi mandassero questo in sostituzione di quello precedente che non funzionava, e non ho proprio nessuna voglia di fare un altro RMA…

Moreno · 10 Maggio 2020, 11:57am

Ciao

Per quell’errore http://hardforum.com/showthread.php?t=1558825 consigliano di sostituire il cavo.
Provare non costa nulla.

Comunque pare non sia un problema grave e potrebbe essere connesso col messaggio che trovi in dmesg.

Ciao Ciao, Moreno

marcomotta · 10 Maggio 2020, 11:57am

Il problema è sicuramente connesso al messaggio in dmesg (12-13 ripetizioni in dmesg, seguite da “hard resetting link”, e successivo incremento di 1 del valore di UDMA_CRC_Error_Count).
Per quanto riguarda il cavo, in realtà avevo “preso in prestito” il cavo del lettore DVD, attaccando l’hard disk alla bell’e meglio appoggiato fuori dal case per spostare i dati, ma la sua destinazione finale è un box esata con relativo cavo.
Destinazione che, nel frattempo, gli ho fatto raggiungere (in effetti, essendo sata, probabilmente la velocità di trasferimento dati sarà la stessa, a pensarci bene), e ora vediamo se il problema si ripete (per il momento UDMA_CRC_Error_Count è fermo a 21, quindi sembra di no, ma ho ricominciato a copiare da un quarto d’ora circa).